Skip to content

fix: repo-wide correctness, security & filesystem-safety hardening pass (v3.2.0)#92

Merged
Mikola Lysenko (mikolalysenko) merged 2 commits into
mainfrom
audit/hardening-pass-v3.2.0
May 29, 2026
Merged

fix: repo-wide correctness, security & filesystem-safety hardening pass (v3.2.0)#92
Mikola Lysenko (mikolalysenko) merged 2 commits into
mainfrom
audit/hardening-pass-v3.2.0

Conversation

@mikolalysenko
Copy link
Copy Markdown
Collaborator

Summary

A repo-wide review where every source file in both crates was read line by line, the bugs found were fixed, and regression tests were added throughout. The audit harness that drove it is included (scripts/study-crates.ts); per-file write-ups are in the (gitignored) study-output/SUMMARY.md.

Net: 90 source files touched, ~10k lines added (mostly tests). Version bumped 3.1.0 → 3.2.0 (minor) across Cargo / npm / PyPI manifests.

Security

  • Path-traversal in archive extraction (patch/package.rs): validated the raw tar path but wrote the package/-stripped one, so package//etc/passwd escaped the package tree. Now validates the normalized path actually written.
  • Unbounded preallocation from an untrusted delta header (patch/diff.rs): the bsdiff target-size field (never validated by qbsdiff) fed Vec::with_capacity; a tiny hostile delta could abort the process. Clamped to 64 MiB.
  • Evidence-free VEX attestation (vex/verify.rs): zero-file patches reported applied; now omitted as no_files.

Filesystem safety / atomicity / rollback

  • apply: DirWriteGuard for read-only dirs (Go cache 0o555), chown-before-chmod to preserve setuid/setgid, parent-dir fsync after rename.
  • cow: atomic rename-over-symlink (no destructive pre-unlink), stage cleanup on every error arm.
  • rollback: now delegates to the hardened apply_file_patch; AlreadyOriginal checked before the blob lookup; read-only-dir new-file delete.
  • file_hash/git_sha256: open-once + fstat (closes a TOCTOU), regular-file guard, streamed size/body mismatch detection.
  • cargo/nuget sidecars: hardened atomic writes/deletes in read-only registry caches.
  • cleanup_blobs: symlink-tolerant, accurate "checked" count.
  • apply_lock: genuine flock errors surface as Io (not Held); sleep clamped to remaining budget.

Crawlers (on-disk layout & metadata)

Composer v-prefix + malformed-entry tolerance; Go cache-at-root / version case-encoding / GOPATH list / module directive; npm symlink following + nested-recursion guard; NuGet global-cache version casing; Python macOS framework layout + .dist-info dir-name fallback; Deno macOS cache path / XDG_CACHE_HOME / empty DENO_DIR; Maven XML-comment stripping + skip-section depth; Cargo TOML header tolerance + dir-name version split; shared utils/fs::entry_is_dir follows symlinks.

API client, commands & misc

  • Proxy-url override honored on binary downloads; deterministic org/title/batch-flag selection; case-insensitive hash compare.
  • USER_AGENT + telemetry version now track CARGO_PKG_VERSION (were stale 1.0/1.0.0).
  • apply release-variant NotFound spurious-failure fix; get/scan/remove char-safe truncation (UTF-8 panic); setup/repair honest non-zero exit codes + failure telemetry; rollback no-op miscount; unlock released-snapshot; vex qualified PyPI/Gem/Maven PURL resolution.
  • package.json: non-object root/scripts no-panic, workspace dedup, bare/deep glob support, inline-comment stripping, top-level key-order preservation (preserve_order).
  • Smaller: deterministic list ordering, case-insensitive fuzzy_match tie-break, json_envelope status invariant + oldUuid, lock_cli sub-second timeout, VEX schema/product fixes.

Tests

  • Hundreds of regression tests added across all reviewed modules.
  • Updated two stale e2e expectations that codified pre-fix behavior:
    • repair_with_blob_404_marks_failure_in_summary → now asserts exit 1 (repair correctly fails on partial download failure).
    • crawler_python_e2e missing/malformed-METADATA cases → now assert the dir-name fallback recovers the package (and a genuinely unparseable name still skips).
  • Full suite green with --features cargo.

Notes

  • The 33 MB generated study-output/ (raw session logs + SUMMARY) is not committed; added to .gitignore. The audit harness scripts are committed for reuse in future studies.

🤖 Generated with Claude Code

…ss (v3.2.0)

Reviewed every source file in both crates line by line, fixed the bugs
found, and added regression tests throughout. Highlights:

Security
- patch/package.rs: path-traversal via validate-before-normalize
  (package//etc/passwd escaped the package tree)
- patch/diff.rs: clamp unbounded Vec preallocation from untrusted
  bsdiff target-size header (OOM/abort on a hostile delta)
- vex/verify.rs: omit zero-file patches instead of emitting an
  evidence-free not_affected attestation

Filesystem safety / atomicity / rollback
- apply: DirWriteGuard for read-only dirs, chown-before-chmod to keep
  setuid/setgid, parent-dir fsync after rename
- cow: atomic rename-over symlink (no pre-unlink), stage cleanup
- rollback: delegate to hardened apply_file_patch; AlreadyOriginal
  before blob check; read-only-dir new-file delete
- file_hash/git_sha256: open-once + fstat (TOCTOU), regular-file guard,
  size/body mismatch detection
- cargo/nuget sidecars: hardened writes/deletes in read-only caches
- cleanup_blobs: symlink-tolerant, accurate counts
- apply_lock: classify genuine flock errors as Io, clamp timeout sleep

Crawlers (on-disk layout & metadata)
- composer v-prefix + malformed-entry tolerance + on-disk check
- go cache-at-root, version case-encoding, GOPATH list, module directive
- npm symlink following + nested-recursion guard
- nuget global-cache version casing
- python macOS framework layout + dist-info dir-name fallback
- deno macOS cache path, XDG_CACHE_HOME, empty DENO_DIR
- maven XML-comment stripping + skip-section depth
- cargo TOML header tolerance + dir-name version split
- shared utils/fs::entry_is_dir follows symlinks

API client, commands & misc
- proxy-url override on binary downloads; deterministic org/title/batch
  flag; case-insensitive hash compare
- USER_AGENT + telemetry version track CARGO_PKG_VERSION (was 1.0.0)
- apply release-variant NotFound spurious-failure fix
- get/scan/remove char-safe truncation (UTF-8 panic)
- setup/repair honest non-zero exit codes + telemetry
- rollback no-op miscount; unlock released-snapshot; vex qualified PURLs
- package.json non-object/dedup/glob/key-order (preserve_order)
- json_envelope status invariant + oldUuid; list ordering; fuzzy_match
  tie-break; lock_cli sub-second timeout; vex schema/product fixes

Updated stale repair/python_crawler e2e expectations to the corrected
contracts. Bumped version to 3.2.0 and added the scripts/study-crates.ts
audit harness used to drive the review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`File::open` on a directory fails outright on Windows (different OS error
kind), whereas on Unix it opens and the is_file() guard rejects it with
InvalidInput. The production code rejects directories on both platforms;
only pin the specific InvalidInput kind off-Windows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mikolalysenko Mikola Lysenko (mikolalysenko) merged commit 36d440e into main May 29, 2026
78 of 79 checks passed
@mikolalysenko Mikola Lysenko (mikolalysenko) deleted the audit/hardening-pass-v3.2.0 branch May 29, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants