feat(phase-19): track F production RAG lessons 64-69#213
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (2)
📝 WalkthroughWalkthroughThis PR adds six complete, progressive capstone lessons (64–69) for building a RAG system from first principles, plus curriculum metadata updates. Each lesson includes implementation code, unit tests, English documentation, and quiz assets. The lessons form an integrated curriculum: text chunking strategies, hybrid BM25+dense retrieval, cross-encoder reranking, query rewriting (HyDE/multi-query/decomposition), offline evaluation metrics, and a fully integrated end-to-end pipeline with threshold-based validation. ChangesCurriculum Foundation and New Lessons
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (6)
phases/19-capstone-projects/65-hybrid-retrieval-bm25-dense/code/main.py (3)
95-96: ⚡ Quick winConsider adding
strict=Truetozip()for safer iteration.The
zip(self.docs, scores)at line 95 would benefit fromstrict=Trueto ensure the lists match in length, catching potential indexing bugs.🔧 Proposed fix
- ranked = sorted(zip(self.docs, scores), key=lambda x: -x[1]) + ranked = sorted(zip(self.docs, scores, strict=True), key=lambda x: -x[1]) return [(d, s) for d, s in ranked[:k] if s > 0]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@phases/19-capstone-projects/65-hybrid-retrieval-bm25-dense/code/main.py` around lines 95 - 96, The zip of self.docs and scores in the ranking code (ranked = sorted(zip(self.docs, scores), ...)) can silently truncate if lengths differ; change the zip call to use strict=True (i.e., zip(self.docs, scores, strict=True)) so a ValueError is raised when lengths mismatch, ensuring bugs are caught early in the ranking flow that builds ranked and the return list slice.
120-121: ⚡ Quick winConsider adding
strict=Truetozip()for safer iteration.Same issue as in lesson 64:
zip(a, b, strict=True)makes the dimension-match assumption explicit.🔧 Proposed fix
def cosine(a: list[float], b: list[float]) -> float: - return sum(x * y for x, y in zip(a, b)) + return sum(x * y for x, y in zip(a, b, strict=True))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@phases/19-capstone-projects/65-hybrid-retrieval-bm25-dense/code/main.py` around lines 120 - 121, The cosine function uses zip(a, b) which silently truncates mismatched-length vectors; change the zip call in function cosine to zip(a, b, strict=True) so mismatched dimensions raise an error, ensuring explicit dimension checks during iteration.
143-159: ⚡ Quick winConsider adding
strict=Truetozip()at line 154.Although line 151 validates that
len(weights) == len(rankings), usingzip(weights, rankings, strict=True)at line 154 provides an additional runtime safety check and makes the contract explicit.🔧 Proposed fix
score: dict[str, float] = defaultdict(float) by_id: dict[str, Doc] = {} - for w, ranks in zip(weights, rankings): + for w, ranks in zip(weights, rankings, strict=True): for rank, (doc, _) in enumerate(ranks): score[doc.doc_id] += w * (1.0 / (k + rank + 1)) by_id[doc.doc_id] = doc🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@phases/19-capstone-projects/65-hybrid-retrieval-bm25-dense/code/main.py` around lines 143 - 159, The rrf function currently zips weights and rankings without the strict flag; even though len(weights) is validated earlier, add strict=True to the zip call (i.e., use zip(weights, rankings, strict=True)) in the loop that iterates over weights and ranks to enforce the one-to-one contract at runtime and surface any unexpected length mismatches immediately; update the zip invocation inside rrf accordingly.phases/19-capstone-projects/64-chunking-strategies-advanced/code/main.py (1)
172-173: ⚡ Quick winConsider adding
strict=Truetozip()for safer iteration.In Python 3.10+, using
zip(a, b, strict=True)catches length mismatches at runtime. While the vectors frommock_embedshould always match in dimension, addingstrict=Truemakes the assumption explicit and helps catch bugs if the code changes.🔧 Proposed fix
def cosine(a: list[float], b: list[float]) -> float: - return sum(x * y for x, y in zip(a, b)) + return sum(x * y for x, y in zip(a, b, strict=True))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@phases/19-capstone-projects/64-chunking-strategies-advanced/code/main.py` around lines 172 - 173, The cosine(a: list[float], b: list[float]) function uses zip(a, b) which silently truncates if lengths differ; update the zip call to zip(a, b, strict=True) to raise an error on length mismatch so dimension mismatches are caught at runtime and prevent subtle bugs.phases/19-capstone-projects/64-chunking-strategies-advanced/code/tests/test_chunkers.py (1)
44-48: ⚡ Quick winConsider adding
strict=Truetozip()or usingitertools.pairwise().For Python 3.10+,
zip(strict=True)catches length mismatches. Alternatively,itertools.pairwise()(Python 3.10+) is more idiomatic for iterating over successive pairs.🔧 Proposed fix (Option 1: strict)
def test_overlap_visible(self) -> None: chunks = fixed_window("d", SAMPLE_PROSE, size=60, overlap=20) # Adjacent chunks must share at least the overlap region size by construction - for a, b in zip(chunks, chunks[1:]): + for a, b in zip(chunks, chunks[1:], strict=True): self.assertGreaterEqual(a.end - b.start, 20)🔧 Proposed fix (Option 2: pairwise)
+ from itertools import pairwise + def test_overlap_visible(self) -> None: chunks = fixed_window("d", SAMPLE_PROSE, size=60, overlap=20) # Adjacent chunks must share at least the overlap region size by construction - for a, b in zip(chunks, chunks[1:]): + for a, b in pairwise(chunks): self.assertGreaterEqual(a.end - b.start, 20)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@phases/19-capstone-projects/64-chunking-strategies-advanced/code/tests/test_chunkers.py` around lines 44 - 48, In test_overlap_visible, the use of zip(chunks, chunks[1:]) can silently miss or hide length mismatches; update the pair iteration to either use zip(chunks, chunks[1:], strict=True) or replace the loop with itertools.pairwise(chunks) (import pairwise) to make failures explicit; keep the assertion a.end - b.start >= 20 and ensure imports are adjusted if you choose pairwise.phases/19-capstone-projects/67-query-rewriting-hyde/code/main.py (1)
302-304: ⚡ Quick winDeduplicate multi-query rewrites before retrieval fusion.
MultiQueryRewriter.rewrite()can emit duplicate queries (especially fallback cases), which causes duplicate rankings to be fused multiple times and biases RRF scores.Proposed fix
def rewrite(self, query: str) -> RewriteResult: - rewrites = [query] + self.llm.paraphrase(query, n=self.n) + raw = [query] + self.llm.paraphrase(query, n=self.n) + rewrites: list[str] = [] + seen: set[str] = set() + for r in raw: + key = r.strip().lower() + if key in seen: + continue + seen.add(key) + rewrites.append(r) return RewriteResult(strategy=self.name, rewrites=rewrites)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@phases/19-capstone-projects/67-query-rewriting-hyde/code/main.py` around lines 302 - 304, MultiQueryRewriter.rewrite can emit duplicate queries; change the construction of the rewrites list in rewrite(self, query) to remove duplicates while preserving order before returning RewriteResult. Specifically, deduplicate the list [query] + self.llm.paraphrase(query, n=self.n) (keeping the first occurrence of each string) and then pass that deduplicated list into RewriteResult(strategy=self.name, rewrites=rewrites) so duplicate paraphrases don't bias RRF fusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@phases/19-capstone-projects/66-reranker-cross-encoder/docs/en.md`:
- Around line 48-49: Update the documentation to match the actual CrossEncoder
implementation: change the statement that the model uses "one attention head" to
reflect that CrossEncoder defaults to 4 heads (refer to CrossEncoder and its
constructor/defaults), and replace the claim about batching via torch.stack with
a note that _batch_encode builds batches using torch.tensor(...) (refer to
_batch_encode). Ensure the wording describes the real architecture (single
transformer block with default 4 attention heads) and the actual batching
mechanism to avoid teaching drift.
In `@phases/19-capstone-projects/68-rag-eval-precision-recall/code/main.py`:
- Around line 62-67: mean_reciprocal_rank currently uses zip which silently
truncates when retrieved_per_query and gold_per_query differ; add an explicit
length check at the start of mean_reciprocal_rank that compares
len(retrieved_per_query) and len(gold_per_query) and raise a ValueError (or
return an explicit error) with a clear message including both lengths if they
differ, then proceed to compute the average using reciprocal_rank as before.
In `@phases/19-capstone-projects/68-rag-eval-precision-recall/docs/en.md`:
- Around line 97-98: Update the documentation text about gold_answer_substring
to match current evaluator behavior: remove the statement that faithfulness is
judged by the gold answer substring and instead state that faithfulness is
evaluated by comparing extracted claims to retrieved context (as implemented in
code/main.py), and note that all metrics (faithfulness, relevance, etc.) are
printed together in the same results table; apply the same correction to the
other occurrence of the outdated wording.
In `@phases/19-capstone-projects/69-end-to-end-rag-system/code/main.py`:
- Around line 474-487: Pipeline.query currently records strategies from
rewriter.pick_strategy but only performs a single retrieval; when strategy ==
"hyde" it uses rewrite_hyde and search_with_hypothetical, but for "multiquery"
and "decompose" it still calls self.index.search. Update Pipeline.query to
branch on the returned strategy: if "hyde" use self.rewriter.rewrite_hyde(...)
and self.index.search_with_hypothetical(...); if "multiquery" call the
multi-query flow (e.g., get multiple sub-queries from a rewriter method like
rewriter.generate_multi_queries(...) or similar and call a corresponding index
method such as self.index.search_multiquery(...) or run multiple
self.index.search(...) calls and aggregate results); if "decompose" call the
decomposition flow (e.g., self.rewriter.decompose(...) then perform per-part
searches and merge results or call self.index.search_decompose(...)); otherwise
fall back to self.index.search(...). Ensure you reference and call the
appropriate rewriter and index methods (rewriter.pick_strategy,
rewriter.rewrite_hyde, rewriter.decompose / rewriter.generate_multi_queries and
index.search_with_hypothetical, index.search_multiquery / index.search_decompose
or perform multiple index.search calls) and preserve latencies/top_n handling.
---
Nitpick comments:
In `@phases/19-capstone-projects/64-chunking-strategies-advanced/code/main.py`:
- Around line 172-173: The cosine(a: list[float], b: list[float]) function uses
zip(a, b) which silently truncates if lengths differ; update the zip call to
zip(a, b, strict=True) to raise an error on length mismatch so dimension
mismatches are caught at runtime and prevent subtle bugs.
In
`@phases/19-capstone-projects/64-chunking-strategies-advanced/code/tests/test_chunkers.py`:
- Around line 44-48: In test_overlap_visible, the use of zip(chunks, chunks[1:])
can silently miss or hide length mismatches; update the pair iteration to either
use zip(chunks, chunks[1:], strict=True) or replace the loop with
itertools.pairwise(chunks) (import pairwise) to make failures explicit; keep the
assertion a.end - b.start >= 20 and ensure imports are adjusted if you choose
pairwise.
In `@phases/19-capstone-projects/65-hybrid-retrieval-bm25-dense/code/main.py`:
- Around line 95-96: The zip of self.docs and scores in the ranking code (ranked
= sorted(zip(self.docs, scores), ...)) can silently truncate if lengths differ;
change the zip call to use strict=True (i.e., zip(self.docs, scores,
strict=True)) so a ValueError is raised when lengths mismatch, ensuring bugs are
caught early in the ranking flow that builds ranked and the return list slice.
- Around line 120-121: The cosine function uses zip(a, b) which silently
truncates mismatched-length vectors; change the zip call in function cosine to
zip(a, b, strict=True) so mismatched dimensions raise an error, ensuring
explicit dimension checks during iteration.
- Around line 143-159: The rrf function currently zips weights and rankings
without the strict flag; even though len(weights) is validated earlier, add
strict=True to the zip call (i.e., use zip(weights, rankings, strict=True)) in
the loop that iterates over weights and ranks to enforce the one-to-one contract
at runtime and surface any unexpected length mismatches immediately; update the
zip invocation inside rrf accordingly.
In `@phases/19-capstone-projects/67-query-rewriting-hyde/code/main.py`:
- Around line 302-304: MultiQueryRewriter.rewrite can emit duplicate queries;
change the construction of the rewrites list in rewrite(self, query) to remove
duplicates while preserving order before returning RewriteResult. Specifically,
deduplicate the list [query] + self.llm.paraphrase(query, n=self.n) (keeping the
first occurrence of each string) and then pass that deduplicated list into
RewriteResult(strategy=self.name, rewrites=rewrites) so duplicate paraphrases
don't bias RRF fusion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f23cb9da-fb46-446c-9f19-534791c14c19
📒 Files selected for processing (25)
catalog.jsonphases/19-capstone-projects/64-chunking-strategies-advanced/code/main.pyphases/19-capstone-projects/64-chunking-strategies-advanced/code/tests/test_chunkers.pyphases/19-capstone-projects/64-chunking-strategies-advanced/docs/en.mdphases/19-capstone-projects/64-chunking-strategies-advanced/quiz.jsonphases/19-capstone-projects/65-hybrid-retrieval-bm25-dense/code/main.pyphases/19-capstone-projects/65-hybrid-retrieval-bm25-dense/code/tests/test_hybrid.pyphases/19-capstone-projects/65-hybrid-retrieval-bm25-dense/docs/en.mdphases/19-capstone-projects/65-hybrid-retrieval-bm25-dense/quiz.jsonphases/19-capstone-projects/66-reranker-cross-encoder/code/main.pyphases/19-capstone-projects/66-reranker-cross-encoder/code/tests/test_reranker.pyphases/19-capstone-projects/66-reranker-cross-encoder/docs/en.mdphases/19-capstone-projects/66-reranker-cross-encoder/quiz.jsonphases/19-capstone-projects/67-query-rewriting-hyde/code/main.pyphases/19-capstone-projects/67-query-rewriting-hyde/code/tests/test_rewriters.pyphases/19-capstone-projects/67-query-rewriting-hyde/docs/en.mdphases/19-capstone-projects/67-query-rewriting-hyde/quiz.jsonphases/19-capstone-projects/68-rag-eval-precision-recall/code/main.pyphases/19-capstone-projects/68-rag-eval-precision-recall/code/tests/test_eval.pyphases/19-capstone-projects/68-rag-eval-precision-recall/docs/en.mdphases/19-capstone-projects/68-rag-eval-precision-recall/quiz.jsonphases/19-capstone-projects/69-end-to-end-rag-system/code/main.pyphases/19-capstone-projects/69-end-to-end-rag-system/code/tests/test_pipeline.pyphases/19-capstone-projects/69-end-to-end-rag-system/docs/en.mdphases/19-capstone-projects/69-end-to-end-rag-system/quiz.json
# Conflicts: # catalog.json
Summary
Adds six deep capstone sub-lessons for Phase 19 Track F (Production-grade RAG):
Every lesson ships docs (~1200-1500 words with mermaid), main.py with the algorithm from scratch, an exhaustive unittest suite, and a 6-question quiz.
Stack
Test plan
phases/19-capstone-projects/; site/, root README, and catalog.json untouched.