feat(phase-19): track I safety red-team lessons 82-87#212
Conversation
Six-category taxonomy (role-play, instruction-override, context-smuggling, multi-turn-ramp, encoding-trick, prefix-injection) partitions attacks by trust boundary abused. Fixtures hand-authored, severity 1-5. Trigram cosine matcher assigns category to candidate prompts. Validator enforces minimum per-category count, severity range, unique ids, non-empty prompts. Includes 50 fixtures, taxonomy.json artifact for downstream lessons, 15 unittest cases, quiz with 6 questions, skill output.
Layered detector pipeline: normalize (zero-width, homoglyph, base64, hex, leet, rot13) then substring rules then regex rules. Each rule carries a category and a base score; aggregator returns the highest scoring category with its confidence. Runner reads the lesson 82 taxonomy artifact, evaluates against the 50-fixture corpus plus a 25-prompt benign baseline, and writes per-category precision/recall/F1 to detector_report.json. Includes 14 unittest cases, quiz with 6 questions, skill output, rules and benign corpus as data files for easy extension.
Two-sided refusal metrics: under-refusal (answered unsafe), over-refusal (refused safe), accuracy, ECE calibration, per-category under-refusal join against the lesson 82 taxonomy. Three mock LLM policies (strict, leaky, over-cautious) demonstrate the framework detects opposite failure modes. Labeled corpus: 25 unsafe prompts tagged with taxonomy ids, 30 safe prompts non-overlapping with the lesson 83 benign set. Includes 15 unittest cases, quiz with 6 questions, skill output, ECE binning implementation, refusal phrase classifier.
Three classifiers behind one severity router. Toxicity (harassment terms with negation-window check), PII (email, phone, SSN, Luhn-validated card, IPv4), instruction-leakage (trigram cosine vs a known system prompt). Router takes max severity across classifiers and applies block, redact, warn, or log. Each classifier carries its own redactor; redact-severity outputs flow through all matching redactors before shipping. Includes 18 unittest cases, quiz with 6 questions, skill output, demo over six fixtures exercising all four severity buckets.
YAML constitution defines rules with name, severity, applies_when, must, explanation, fix. Predicates compose via all_of/any_of/not_. Engine emits per-rule status (pass, violation, not_applicable) with matched span. Fixer applies declarative append/prepend/replace operations per rule. diff function produces structured change list between draft and revised. Self-contained yaml_subset parser so the lesson runs without PyYAML, with graceful fallback to PyYAML when present. Includes 18 unittest cases, quiz with 6 questions, skill output, six-rule constitution covering refusal redirects, code closing, PII in examples, citations, internal library leaks, and length bounds.
Three-checkpoint composition: pre-gen detector on the prompt, during-gen streaming filter that buffers chunks and terminates early on harmful continuations, post-gen classifier router and rules engine on the completed output. Deterministic aggregation table picks the final action (block, redact, warn, allow) from the maximum severity across signals. Each request emits a structured RequestTrace with checkpoint verdicts and latency. Demo runs all 50 lesson 82 fixtures plus 10 benign prompts end-to-end, prints per-action and per-category outcomes, and writes gate_trace.json. Includes 12 unittest cases, quiz with 6 questions, skill output, and direct file-spec imports so sibling lessons compose without packaging.
…fact Title em-dashes stay (matches existing capstone convention). One body em-dash in lesson 82 prose replaced with a colon. Lesson 87 gate trace artifact regenerated.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (18)
✅ Files skipped from review due to trivial changes (5)
📝 WalkthroughWalkthroughThis PR introduces six interconnected capstone lessons (82–87) forming a comprehensive LLM safety framework. Lesson 82 establishes a jailbreak taxonomy with 50 prompts across six trust-boundary categories. Lesson 83 builds a detector using layered normalization and rule matching. Lesson 84 evaluates refusal behavior via mock policies and metrics. Lesson 85 implements three independent classifiers (toxicity, PII, instruction leakage) feeding a severity-based router. Lesson 86 provides a declarative rules engine for output constraints. Lesson 87 orchestrates all prior stages (pre/during/post-generation) into a unified safety gate. The catalog is updated to reflect all six new lessons. ChangesCapstone Lessons 82–87: Complete Safety Framework
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 19
🧹 Nitpick comments (1)
phases/19-capstone-projects/87-end-to-end-safety-gate/code/tests.py (1)
74-77: ⚡ Quick win
test_redact_when_classifier_redactscurrently asserts nothing meaningful.Line 76 allows every possible action, so this test can’t fail on regressions. Assert the intended contract (at least not
allow, or specificallyredactfor this input).Suggested fix
def test_redact_when_classifier_redacts(self) -> None: trace = self.gate.handle("Please email me at lee@example.com about my account.") - self.assertIn(trace.final_action, {"redact", "block", "warn", "allow"}) + self.assertIn(trace.final_action, {"redact", "block"})🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@phases/19-capstone-projects/87-end-to-end-safety-gate/code/tests.py` around lines 74 - 77, The test test_redact_when_classifier_redacts is currently vacuous because it allows every outcome; instead assert the intended contract by checking trace.final_action against the expected behavior from self.gate.handle for an email-containing input: replace the broad self.assertIn(trace.final_action, {"redact","block","warn","allow"}) with a stricter assertion (e.g., self.assertNotEqual(trace.final_action, "allow") or self.assertEqual(trace.final_action, "redact")) so the test fails on regressions; update the assertion in test_redact_when_classifier_redacts to use trace.final_action and the chosen expected value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@phases/19-capstone-projects/83-prompt-injection-detector/code/main.py`:
- Around line 29-30: Replace the raw invisible/confusable Unicode in the
normalization constants with explicit \uXXXX escapes so the behavior is
unchanged but the characters are auditable: update the ZERO_WIDTH regex
(ZERO_WIDTH) to use escapes for U+200B, U+200C, U+200D, U+2060 and the range
U+202A–U+202E (e.g. \u200B\u200C\u200D\u2060\u202A-\u202E) and change the
HOMOGLYPHS str.maketrans mapping (HOMOGLYPHS) to use \u04XX/\u04XX-style escapes
for the Cyrillic capitals (А U+0410, В U+0412, С U+0421, Е U+0415, Н U+041D, К
U+041A, М U+041C, О U+041E, Р U+0420, Т U+0422, Х U+0425) mapping to their ASCII
equivalents; keep the same keys/values and semantics but substitute escaped code
points instead of embedding raw characters.
- Around line 83-84: In _try_decode_base64 and _try_decode_hex replace the broad
"except Exception: pass" with targeted exception handling: catch and handle only
the decoding-related exceptions (e.g., for base64 use base64.b64decode and catch
binascii.Error, ValueError and TypeError; for hex use binascii.unhexlify and
catch binascii.Error and ValueError/TypeError), ensure you import binascii if
needed, and return None on those specific failures so other unexpected
exceptions still bubble up.
In `@phases/19-capstone-projects/83-prompt-injection-detector/code/tests.py`:
- Around line 13-15: The test contains raw zero-width characters in the string
literal and assertion which are hard to review; update the test in tests.py to
replace the embedded zero-width characters with explicit Unicode escape
sequences (e.g., use \u200b, \u200c, or \u200d as appropriate) in the variable
assigned to text and in the assertNotIn call that checks normalize(text);
reference the normalize function and the test variable (text) so reviewers can
locate and verify the change.
In `@phases/19-capstone-projects/84-refusal-evaluation/code/prompts.py`:
- Around line 65-96: SAFE_PROMPTS currently has 30 entries (s-01..s-30) but the
docs claim 25; update the documentation to match the implementation by changing
any references that say "25 safe prompts" to "30 safe prompts" (including the
module docstring and en.md). Locate the SAFE_PROMPTS definition to confirm the
count and then update the docstring text and the en.md file so they accurately
state 30 safe prompts and, if present, adjust any related summary or index
numbers to reflect the full list.
- Around line 1-7: The module docstring incorrectly states "25 safe prompts"
while the SAFE_PROMPTS constant actually contains 30 entries (IDs s-01..s-30);
update the top-level docstring in prompts.py to say "30 safe prompts" and
likewise update the corresponding sentence in en.md (around the existing
mention) to reflect 30 safe prompts so both documentation strings match the
SAFE_PROMPTS list.
In `@phases/19-capstone-projects/84-refusal-evaluation/docs/en.md`:
- Around line 42-46: Update the documentation count to match the actual
SAFE_PROMPTS length: change the phrase that reads "plus 25 safe prompts" to
"plus 30 safe prompts" (the code defines SAFE_PROMPTS with entries s-01 through
s-30 in code/prompts.py), and ensure the surrounding sentence in docs/en.md
referencing the labeled corpus reflects this corrected count.
In
`@phases/19-capstone-projects/85-content-classifier-integration/code/classifiers.py`:
- Line 90: The _CARD_RE pattern allows a trailing separator because the repeated
group includes an optional separator; change it so the final character must be a
digit: replace _CARD_RE = re.compile(r"\b(?:\d[ -]?){13,19}\b") with a pattern
that enforces the last character is a digit, e.g. _CARD_RE =
re.compile(r"\b(?:\d[ -]?){12,18}\d\b"), keeping the raw string and re.compile
usage so matches always end on a digit.
- Around line 112-123: The code is recording raw PII matches into the findings
list (uses m.group(0) for _EMAIL_RE, _PHONE_RE, _SSN_RE, _CARD_RE, _IPV4_RE),
which risks leaking sensitive data; update the loops that append to findings to
store only the type labels (e.g., "email", "phone", "ssn", "card", "ip") or a
strictly masked form instead of the full match, while preserving the existing
validation (e.g., keep using _luhn(digits) for card validation) and the same
loop locations (the blocks that iterate _EMAIL_RE, _PHONE_RE, _SSN_RE, _CARD_RE,
_IPV4_RE and append to findings).
In `@phases/19-capstone-projects/85-content-classifier-integration/docs/en.md`:
- Line 43: Update the documentation to match the implemented API: replace
references to Action.redacted_output with Action.output, and update descriptions
of the router function from decide(verdicts) to decide(text, verdicts) so the
docs reflect the actual function signature and returned Action shape (verb,
output, metadata); ensure any examples, explanations, and the mention in lesson
51 use these exact names to avoid integration confusion.
In
`@phases/19-capstone-projects/85-content-classifier-integration/outputs/skill-content-classifier-integration.md`:
- Around line 16-22: The fenced code block showing the data shape for
"ClassifierVerdict" lacks a language tag and triggers markdownlint MD040; update
the opening fence from ``` to a tagged fence such as ```text (or another
appropriate tag like ```yaml) so the block becomes ```text and retains the same
contents for "ClassifierVerdict", preserving indentation and lines for name,
severity, score, and findings.
In `@phases/19-capstone-projects/86-constitutional-rules-engine/code/rules.yml`:
- Around line 33-39: The rule currently detects both email and phone patterns
via the two not_contains_regex entries but the fixer only replaces emails;
update the fix block used by the no-pii-in-examples rule so it also rewrites
phone numbers. Specifically, add a second replace_regex (or expand the existing
pattern) to include the phone regex '\b(\+?\d{1,3}[ .-]?)?(\(?\d{3}\)?[
.-]?)\d{3}[ .-]?\d{4}\b' so that the fixer replaces detected phone numbers
(similar to how the existing replace_regex with pattern
'\b[\w.+-]+@[\w-]+\.[\w.-]+\b' replaces emails with '[example-user]').
In
`@phases/19-capstone-projects/86-constitutional-rules-engine/code/yaml_subset.py`:
- Around line 49-73: The coercion function _coerce currently treats the literal
"{}" as a plain string, breaking inline empty mappings like the `applies_when:
{}` used in rules.yml; update _coerce to detect empty inline mappings and return
an actual empty dict (e.g., if s startswith "{" and endswith "}" and the inner
content is whitespace/empty, return {}), leaving all other coercions unchanged.
In `@phases/19-capstone-projects/86-constitutional-rules-engine/quiz.json`:
- Around line 19-27: The quiz's correct answer index is wrong: update the
"correct" value in the quiz entry for the question "What three fields must every
rule have at minimum?" from 2 to 4 so the correct option becomes "predicate,
severity, owner" (which includes the required severity field) by editing the
"correct" key in quiz.json.
In `@phases/19-capstone-projects/87-end-to-end-safety-gate/code/main.py`:
- Around line 77-84: The benign loop over BENIGN_PROMPTS currently never updates
the global terminations counter; when gate.handle(prompt) returns a trace that
indicates an early termination (use trace.final_action or trace.terminated
flag), increment the same global terminations metric used elsewhere (named
terminations) so benign requests are included; apply the same change to the
other benign-processing block referenced by the reviewer (the logic around
traces.append and per_category_outcome updates) to mirror how terminations is
incremented for non-benign requests.
In
`@phases/19-capstone-projects/87-end-to-end-safety-gate/code/mock_llm_stream.py`:
- Around line 60-70: The stream function uses chunk_tokens as the range step
without validation, so pass a check at the top of stream(prompt: str,
chunk_tokens: int = 4) to ensure chunk_tokens is an int > 0 (e.g., raise
ValueError with a clear message if chunk_tokens <= 0 or not an int) before
calling range(..., chunk_tokens) and then proceed to chunk the tokens; reference
the stream function and the chunk_tokens parameter when adding the validation.
In `@phases/19-capstone-projects/87-end-to-end-safety-gate/code/safety_gate.py`:
- Around line 170-172: The code currently indexes SEVERITY_RANK directly with
post.classifier_severity and post.rules_max_severity causing KeyError on
unexpected tokens; change both places to safely lookup with a fallback (e.g.,
use SEVERITY_RANK.get(post.classifier_severity, <safe-default>) and
SEVERITY_RANK.get(post.rules_max_severity, <safe-default>)) so
signals.append(...) always receives a numeric severity; update the occurrences
that build ("post.classifier", ...) and ("post.rules", ...) to use .get and pick
a sensible default (like 0 or the lowest severity) to degrade gracefully.
- Around line 187-194: After calling classifier_router.run(raw_output) and
getting classifier_action.output into redacted, ensure you don't return an
empty/falsey string: if redacted is empty after classifier_router.run and after
optional rules_fixer.apply, replace it with a safe fallback (for example call a
helper like self.safe_fallback(raw_output) or return a generic safe message)
before returning; update the logic around classifier_action.output,
post.rules_violations, rules_engine.evaluate(...).violations(), and
rules_fixer.apply(...) to perform this empty-check and fallback substitution so
the redact branch never returns a blank body.
In `@phases/19-capstone-projects/87-end-to-end-safety-gate/docs/en.md`:
- Around line 42-43: The aggregation table row that currently reads "detector
confidence 0.5-0.85, no other signal | allow with note" conflicts with the
implemented action which emits final_action="warn"; update the table text for
the "detector confidence 0.5-0.85, no other signal" case to read "warn" (or
otherwise match the exact implemented token final_action="warn") so wording is
consistent with the implementation.
In
`@phases/19-capstone-projects/87-end-to-end-safety-gate/outputs/skill-end-to-end-safety-gate.md`:
- Around line 33-43: The fenced code block showing the RequestTrace schema lacks
a language hint, which triggers MD040 linting; update the markdown fenced block
that contains "RequestTrace" (the block starting with ``` and the schema lines
including request_id, prompt, pre_gen, during_gen, post_gen, final_action,
final_output, latency_ms) to include a language identifier such as "text" (i.e.,
```text) so the block is properly annotated for the linter.
---
Nitpick comments:
In `@phases/19-capstone-projects/87-end-to-end-safety-gate/code/tests.py`:
- Around line 74-77: The test test_redact_when_classifier_redacts is currently
vacuous because it allows every outcome; instead assert the intended contract by
checking trace.final_action against the expected behavior from self.gate.handle
for an email-containing input: replace the broad
self.assertIn(trace.final_action, {"redact","block","warn","allow"}) with a
stricter assertion (e.g., self.assertNotEqual(trace.final_action, "allow") or
self.assertEqual(trace.final_action, "redact")) so the test fails on
regressions; update the assertion in test_redact_when_classifier_redacts to use
trace.final_action and the chosen expected value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a7bfd78b-0ecc-4edd-8c20-738c790182c2
📒 Files selected for processing (47)
catalog.jsonphases/19-capstone-projects/82-jailbreak-taxonomy/code/fixtures.pyphases/19-capstone-projects/82-jailbreak-taxonomy/code/main.pyphases/19-capstone-projects/82-jailbreak-taxonomy/code/tests.pyphases/19-capstone-projects/82-jailbreak-taxonomy/docs/en.mdphases/19-capstone-projects/82-jailbreak-taxonomy/outputs/skill-jailbreak-taxonomy.mdphases/19-capstone-projects/82-jailbreak-taxonomy/outputs/taxonomy.jsonphases/19-capstone-projects/82-jailbreak-taxonomy/quiz.jsonphases/19-capstone-projects/83-prompt-injection-detector/code/benign.pyphases/19-capstone-projects/83-prompt-injection-detector/code/main.pyphases/19-capstone-projects/83-prompt-injection-detector/code/rules.pyphases/19-capstone-projects/83-prompt-injection-detector/code/tests.pyphases/19-capstone-projects/83-prompt-injection-detector/docs/en.mdphases/19-capstone-projects/83-prompt-injection-detector/outputs/detector_report.jsonphases/19-capstone-projects/83-prompt-injection-detector/outputs/skill-prompt-injection-detector.mdphases/19-capstone-projects/83-prompt-injection-detector/quiz.jsonphases/19-capstone-projects/84-refusal-evaluation/code/main.pyphases/19-capstone-projects/84-refusal-evaluation/code/mock_llm.pyphases/19-capstone-projects/84-refusal-evaluation/code/prompts.pyphases/19-capstone-projects/84-refusal-evaluation/code/tests.pyphases/19-capstone-projects/84-refusal-evaluation/docs/en.mdphases/19-capstone-projects/84-refusal-evaluation/outputs/refusal_eval_report.jsonphases/19-capstone-projects/84-refusal-evaluation/outputs/skill-refusal-evaluation.mdphases/19-capstone-projects/84-refusal-evaluation/quiz.jsonphases/19-capstone-projects/85-content-classifier-integration/code/classifiers.pyphases/19-capstone-projects/85-content-classifier-integration/code/main.pyphases/19-capstone-projects/85-content-classifier-integration/code/tests.pyphases/19-capstone-projects/85-content-classifier-integration/docs/en.mdphases/19-capstone-projects/85-content-classifier-integration/outputs/classifier_report.jsonphases/19-capstone-projects/85-content-classifier-integration/outputs/skill-content-classifier-integration.mdphases/19-capstone-projects/85-content-classifier-integration/quiz.jsonphases/19-capstone-projects/86-constitutional-rules-engine/code/main.pyphases/19-capstone-projects/86-constitutional-rules-engine/code/rules.ymlphases/19-capstone-projects/86-constitutional-rules-engine/code/tests.pyphases/19-capstone-projects/86-constitutional-rules-engine/code/yaml_subset.pyphases/19-capstone-projects/86-constitutional-rules-engine/docs/en.mdphases/19-capstone-projects/86-constitutional-rules-engine/outputs/rules_report.jsonphases/19-capstone-projects/86-constitutional-rules-engine/outputs/skill-constitutional-rules-engine.mdphases/19-capstone-projects/86-constitutional-rules-engine/quiz.jsonphases/19-capstone-projects/87-end-to-end-safety-gate/code/main.pyphases/19-capstone-projects/87-end-to-end-safety-gate/code/mock_llm_stream.pyphases/19-capstone-projects/87-end-to-end-safety-gate/code/safety_gate.pyphases/19-capstone-projects/87-end-to-end-safety-gate/code/tests.pyphases/19-capstone-projects/87-end-to-end-safety-gate/docs/en.mdphases/19-capstone-projects/87-end-to-end-safety-gate/outputs/gate_trace.jsonphases/19-capstone-projects/87-end-to-end-safety-gate/outputs/skill-end-to-end-safety-gate.mdphases/19-capstone-projects/87-end-to-end-safety-gate/quiz.json
| except Exception: | ||
| pass |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify broad exception swallowing locations in this file.
python - <<'PY'
from pathlib import Path
path = Path("phases/19-capstone-projects/83-prompt-injection-detector/code/main.py")
for i, line in enumerate(path.read_text(encoding="utf-8").splitlines(), 1):
if "except Exception" in line:
print(f"{path}:{i}:{line.strip()}")
PYRepository: rohitg00/ai-engineering-from-scratch
Length of output: 262
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
from pathlib import Path
path = Path("phases/19-capstone-projects/83-prompt-injection-detector/code/main.py")
lines = path.read_text(encoding="utf-8").splitlines()
for start in (75, 90):
end = start + 20
print(f"\n--- {path}:{start}-{end} ---")
for i in range(start-1, min(end, len(lines))):
print(f"{i+1:4d}: {lines[i]}")
PYRepository: rohitg00/ai-engineering-from-scratch
Length of output: 1957
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
from pathlib import Path
path = Path("phases/19-capstone-projects/83-prompt-injection-detector/code/main.py")
lines = path.read_text(encoding="utf-8").splitlines()
for start in (75, 90):
end = start + 20
print(f"\n--- {path}:{start}-{end} ---")
for i in range(start-1, min(end, len(lines))):
print(f"{i+1:4d}: {lines[i]}")
PYRepository: rohitg00/ai-engineering-from-scratch
Length of output: 1957
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
from pathlib import Path
path = Path("phases/19-capstone-projects/83-prompt-injection-detector/code/main.py")
lines = path.read_text(encoding="utf-8").splitlines()
for i in range(1, 60):
if i-1 < len(lines):
print(f"{i:4d}: {lines[i-1]}")
PYRepository: rohitg00/ai-engineering-from-scratch
Length of output: 2022
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
from pathlib import Path
path = Path("phases/19-capstone-projects/83-prompt-injection-detector/code/main.py")
lines = path.read_text(encoding="utf-8").splitlines()
for i in range(1, 80):
if i-1 < len(lines):
print(f"{i:4d}: {lines[i-1]}")
PYRepository: rohitg00/ai-engineering-from-scratch
Length of output: 2731
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
from pathlib import Path
path = Path("phases/19-capstone-projects/83-prompt-injection-detector/code/main.py")
lines = path.read_text(encoding="utf-8").splitlines()
for i in range(1, 80):
if i-1 < len(lines):
print(f"{i:4d}: {lines[i-1]}")
PYRepository: rohitg00/ai-engineering-from-scratch
Length of output: 2731
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
from pathlib import Path
path = Path("phases/19-capstone-projects/83-prompt-injection-detector/code/main.py")
lines = path.read_text(encoding="utf-8").splitlines()
for i in range(1, 60):
if i-1 < len(lines):
print(f"{i:4d}: {lines[i-1]}")
PYRepository: rohitg00/ai-engineering-from-scratch
Length of output: 2022
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "binascii" phases/19-capstone-projects/83-prompt-injection-detector/code/main.py || trueRepository: rohitg00/ai-engineering-from-scratch
Length of output: 62
Narrow exception handling in base64/hex decoding instead of swallowing all exceptions.
_try_decode_base64 and _try_decode_hex both use except Exception: pass (lines 83-84 and 98-99), which can hide unexpected failures and silently alter normalization behavior. Narrow the exceptions caught.
🔧 Proposed fix
+import binascii
@@
- except Exception:
- pass
+ except (binascii.Error, ValueError):
+ continue
@@
- except Exception:
- pass
+ except ValueError:
+ continue🧰 Tools
🪛 Ruff (0.15.14)
[error] 83-84: try-except-pass detected, consider logging the exception
(S110)
[warning] 83-83: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@phases/19-capstone-projects/83-prompt-injection-detector/code/main.py` around
lines 83 - 84, In _try_decode_base64 and _try_decode_hex replace the broad
"except Exception: pass" with targeted exception handling: catch and handle only
the decoding-related exceptions (e.g., for base64 use base64.b64decode and catch
binascii.Error, ValueError and TypeError; for hex use binascii.unhexlify and
catch binascii.Error and ValueError/TypeError), ensure you import binascii if
needed, and return None on those specific failures so other unexpected
exceptions still bubble up.
…i fixer, quiz key includes severity
… on empty redact, count benign terminations, doc action token
# Conflicts: # catalog.json
Summary
Six capstone sub-lessons under
phases/19-capstone-projects/that compose into a runnable safety + red-team harness.taxonomy.jsonartifactall_of/any_of/not_predicates, six-rule starter constitution, declarative fixer (append/prepend/replace), structured diff between draft and revised, self-contained YAML subset parserEvery lesson ships docs (mermaid diagram, 900-1100 words), runnable
main.py,tests.py(12-18 unittests each),quiz.json(6 questions), and askill-*.mdoutput. Total: 92 unit tests across the six lessons, all green; six demos all exit 0 and write JSON artifacts under each lesson'soutputs/.Implementation uses only
numpy(lessons 82-84) plus an optionalpyyamlfallback (lesson 86 ships its own YAML subset parser so the lesson runs on a stock Python install). No real LLM calls; mock LLMs throughout. No external red-team / safety repo names or paper citations in any file.Test plan
python3 -m unittest testspasses in each of the six lessoncode/directories (15 + 14 + 15 + 18 + 18 + 12 = 92 tests)python3 main.pyexits 0 in each lesson and writes its artifactsite/, rootREADME.md, orcatalog.jsontouched