manifest/bazel: nested-workspace + Bazel-native Maven extraction#1342
manifest/bazel: nested-workspace + Bazel-native Maven extraction#1342Simon (simonhj) wants to merge 11 commits into
Conversation
20957bc to
23e2f96
Compare
|
Claude (@claude) review once |
cf78175 to
bec0ccf
Compare
…ub-workspace discovery The existing bazel-query discovery path only inspects MODULE.bazel / WORKSPACE at the invocation cwd. Ruleset repos with per-example sub-workspaces (rules_kotlin/examples, rules_js/examples, rules_rust, rules_python) declare additional Maven artifacts in nested MODULE.bazel projects with their own maven_install.json lockfiles. Those files were silently dropped, leaving the CLI's SBOM a strict subset of what the server-side depscan parser already returns from the same tree. Add a walker that finds every checked-in maven_install.json under cwd (pruning .git, node_modules, .socket-auto-manifest, and Bazel's bazel-* convenience symlinks into <output_base>), parses each via the existing parseUnsortedDepsJson v2-lockfile path, and merges the artifacts into the SBOM after the bazel-query extraction step. Merge is keyed by mavenCoordinates so the root workspace's lockfile (which bazel-query already extracts) does not double-count; conflicting group:artifact versions across sub-workspaces continue to surface as the existing loud-failure error in normalizeToMavenInstallJson. Verified against bazel-bench/oss/rules_kotlin: walker now surfaces all 10 examples/*/maven_install.json files and merges 393 unique artifacts into the SBOM beyond what the root @kotlin_rules_maven discovery returns. No regression on tink-java (0 lockfiles) or protobuf (1 root lockfile, deduped against bazel-query's @maven extraction).
…er walker already covers it The CLI was walking the tree for **/maven_install.json and **/*_maven_install.json lockfiles and merging them into its output. The server-side scan walker matches the same pattern natively via getReportSupportedFiles, so the CLI re-reading these files duplicated work and produced output that was a strict subset of what the walker already saw when the scan was uploaded. Removes: - bazel-lockfile-discovery.mts (196 lines) - bazel-lockfile-discovery.test.mts (241 lines) - extract_bazel_to_maven step 5b (33 lines): the merge-back-into-allArtifacts loop The .socket-auto-manifest/maven_install.json the CLI emits is still picked up by the same walker — that composition stays intact. After this change the CLI emits only what running bazel produces (the complement of the walker's lockfile coverage).
…very `findWorkspaceRoots` walks the tree from cwd and returns every directory containing MODULE.bazel / WORKSPACE / WORKSPACE.bazel. Monorepos host multiple workspace roots (e.g. examples/<name>/MODULE.bazel, mobile/ MODULE.bazel under an otherwise non-Bazel root); the per-workspace algorithm in the orchestrator runs once per discovered root. Pruning matches the previous lockfile walker: skip the usual non-workspace directories (.git, node_modules, .socket-auto-manifest, etc.), Bazel's `bazel-*` output_base symlinks (so we never recurse into tens of GiB of generated state), and `dist*` build-output directories. Caps `MAX_WALK_DEPTH` and `MAX_WORKSPACE_ROOTS` guard against pathological inputs and symlink loops. Pure-function module with no Bazel calls; unit tests use a tmpdir fixture tree and cover the root-only, nested, prune, symlink, and sort-determinism cases.
…+ probe primitives
Drop all static parsing of MODULE.bazel / WORKSPACE / *.bzl sources.
Bazel itself sees those files via `mod show_extension` and `cquery`; the
CLI no longer needs to interpret Starlark.
`parseShowExtensionOutput` consumes the text-format report from
bazel mod show_extension @rules_jvm_external//:extensions.bzl%maven
and returns the hub repos (items annotated with `(imported by ...)`).
Generated per-artifact bullets are skipped; `DEBUG:` / `WARNING:` lines
are tolerated; the parser stops at the next `## ` section header so
multi-extension reports don't cross-contaminate.
`classifyProbeResult` turns a raw probe outcome into a tri-state status:
- populated: code=0 + non-empty stdout
- empty: code=1 + "no targets found beneath"
- not-defined: code=1 + "No repository visible" / "no such package",
or code=0 + empty stdout (WORKSPACE-mode silent miss)
The orchestrator treats `empty` and `not-defined` uniformly as skips; the
distinction is preserved for the sidecar status report.
`CONVENTIONAL_MAVEN_REPO_NAMES` exposes the names the legacy WORKSPACE
path probes (`maven`, `maven_install`, `maven_dev`, `unpinned_maven`,
`maven_unpinned`). `--bazel-maven-repo=` extras are appended by the
orchestrator (sibling todo).
Deleted exports: `parseMavenRepoCandidates`, `parseVisibleRepoCandidates`,
`validateMavenRepo`, `discoverMavenRepos`. Their replacements live in the
new primitives above; the orchestrator rewrite that wires them up lands
in a follow-up layer. `extract_bazel_to_maven.mts` does not typecheck
in this intermediate state — fixed in the orchestrator commit.
Tests cover the parser fixture (hub vs generated, separator variants,
multi-section reports), the tri-state classifier (every documented
input), and the verbose-logging contract for `probeCandidate`.
…tate probe
bazel-query-runner now centralises startup-flag construction so every
spawn — query, cquery, mod show_extension, mod dump_repo_mapping —
threads `--bazel-rc`, `--output_user_root`, and `--output_base`
consistently. The new optional `outputUserRoot` field on
`BazelQueryOptions` is the Maven path's hook for per-invocation server
isolation; the orchestrator (next commit) mkdtemp's a fresh path and
will reap the server via `bazel shutdown` + `rm -rf` on success and on
timeout, so timed-out servers no longer leak across CLI invocations.
Add `runBazelModShowMavenExtension`: invokes
bazel mod show_extension @rules_jvm_external//:extensions.bzl%maven
to enumerate Maven hubs directly from the rules_jvm_external extension
report, replacing the over-enumerating `dump_repo_mapping` surface on
the Maven path. `runBazelModShowVisibleRepos` is kept around for the
legacy PyPI extractor, which has not been rescoped yet.
Replace the Maven-side `buildProbeFor` (which emitted a kind-only
`kind("jvm_import rule|aar_import rule", @repo//:*)` query) with
`buildMavenProbeFor`, a lightweight `cquery '@<name>//... --output=label
--keep_going'` presence check whose result feeds the new tri-state
classifier in bazel-repo-discovery. Kind-only filtering missed
POM-only / native / AAR-without-aar_import artefacts and any future
rules_jvm_external rule shape; the metadata filter is now applied by
the per-repo extraction cquery (next layer), not by the probe.
Update `buildPypiProbeFor`'s return shape to include stderr so it
satisfies the new `RepoProbe` type contract. Move
`parseVisibleRepoCandidates` and the `ValidationResult` type into
bazel-pypi-discovery (their only remaining consumer); the Maven module
no longer carries dump_repo_mapping-shaped code.
Tests cover the new argv shapes for every spawn surface, the
outputUserRoot startup-flag placement (before subcommand), the
Maven probe argv (cquery + @repo//... + --output=label + --keep_going),
and the full result-triple propagation (code/stdout/stderr) that the
tri-state classifier needs.
`runMetadataCqueryForRepo` executes the per-repo extraction cquery and
returns a structured outcome (`ok` / `partial` / `timeout` / `empty` /
`error`) so the orchestrator can populate sidecar status without
custom error plumbing per call site. The cquery target expression is
the union of three predicates — `attr("tags", "\bmaven_coordinates=",
...)`, `attr("maven_coordinates", ".+", ...)`, and `attr("maven_url",
".+", ...)`. That matches rules_jvm_external's `jvm_import` /
`aar_import` shapes, Bazel-native `java_library` with direct
`maven_coordinates`, and POM-only / source-jar shapes that carry only
`maven_url`. Word-boundary `\b` in the tags predicate prevents matches
on values like `pre_maven_coordinates=fake`.
`parseCqueryJsonproto` is defensive about the jsonproto encoding:
dispatches on `attribute[].type`, accepts both camelCase
(`stringValue`, `stringListValue`) and snake_case (`string_value`,
`string_list_value`) payload keys, and tolerates both the Bazel 5+
envelope shape (`{ "results": [{ "target": {...} }] }`) and the older
per-line streamed shape. Coordinate extraction prefers the direct
`maven_coordinates` attribute; falls back to scanning `tags` for
`maven_coordinates=G:A:V`. Provenance lands in `sourceRepo` as
`<workspace-rel-path>:<repoName>` (or just `<repoName>` at the root),
so the orchestrator's dedup can attribute artifacts back to their
discovery site.
Timeout handling: spawn rejections with `timedOut` / `killed` /
`SIGTERM` / `SIGKILL` map to `status: 'timeout'`. The runner does NOT
delete the outputUserRoot — server lifecycle (reap via
`bazel shutdown` + `rm -rf`) is the orchestrator's concern so that a
single tempdir can hold multiple per-repo runs.
Also widen `ExtractedArtifact.ruleKind` from the literal
`'jvm_import' | 'aar_import'` union to `string`. The legacy text-format
parsers only ever set those two values, but the metadata cquery
returns whatever `ruleClass` Bazel reports (`java_library`,
`kt_jvm_import`, any future rules_jvm_external rule). Existing
consumers only read the field diagnostically; nothing else changes.
Tests cover the parser (envelope, per-line stream, snake_case
fallback, direct-vs-tag preference, missing-coordinate skip, empty
input), the argv builder (target expression union, startup-flag
placement, `--bazel-flag` placement, invocationFlags order), and the
runner's status classification including the spawn-timeout branch.
…thm in a tree walk
`extractBazelToMaven` now walks the scan root for every workspace
(MODULE.bazel / WORKSPACE / WORKSPACE.bazel) and runs the per-workspace
extraction algorithm in each one. Monorepos like rules_kotlin
(examples/<name>/MODULE.bazel) and projects with mobile sub-workspaces
(mobile/MODULE.bazel under a non-Bazel root) are no longer
silently dropped to the root-only path.
Per workspace:
1. Detect Bzlmod vs WORKSPACE mode.
2. Discover candidate Maven hubs:
- Bzlmod: bazel mod show_extension @rules_jvm_external//:extensions.bzl%maven,
parsed via parseShowExtensionOutput.
- WORKSPACE (or Bzlmod fallback): probe the conventional names
(maven, maven_install, maven_dev, unpinned_maven, maven_unpinned)
plus any customer-supplied extras via the tri-state classifier.
3. Per populated candidate: run the metadata cquery
(`attr("tags", "\bmaven_coordinates=", @<repo>//...)` ∪ direct
`maven_coordinates` / `maven_url` attrs) and accept the parsed
artefacts.
4. Aggregate, then dedup across workspaces by full Maven coordinate.
Server isolation is now invariant: every Bazel invocation runs under a
per-CLI-call --output_user_root=<tempdir>. On per-repo cquery timeout
the orchestrator reaps the server (`bazel shutdown`) and `rm -rf`'s the
tempdir, then mints a fresh one for subsequent repos — a single bad
hub no longer cascades into the rest of the run. The finally-block
cleanup reaps every tempdir that was minted, including the last one.
Sidecar `manifest-status.json` lands beside the synthesized
`maven_install.json`. Each entry records the repo's classified status
(ok / partial / timeout / empty / error), artifact count, and duration,
so the server-side can surface partial results to the customer. The
top-level `complete: false` flag fires iff any repo timed out.
Deleted: the unsorted_deps.json fast path (`extractFromOneRepo`,
`bazelExternalDir`, `isForceQueryFallbackEnabled` env knob) — the
metadata cquery returns the same GAVs the fast path used to recover,
without depending on bazel-out symlinks or generated artefacts.
Deleted: the lockfile merge (already done in a previous commit on this
branch); deleted: the kind-only probe and dump_repo_mapping enumeration.
The orchestrator's `ExtractBazelOptions` now accepts
`extraMavenRepoNames` (legacy WORKSPACE non-conventional hub names) and
`perRepoTimeoutMs` (per-repo cquery cap). The CLI flag wiring lands in
a sibling commit; existing call sites continue to pass the same fields
they did before.
Existing `extract_bazel_to_maven.test.mts` is pinned to the old
unsorted_deps fast path and is replaced wholesale in the next commit
(test layer).
…e pipeline The previous tests pinned the legacy unsorted_deps.json fast path, kind-only probes, and dump_repo_mapping enumeration. The new tests mock the orchestrator's three external collaborators — findWorkspaceRoots, runBazelModShowMavenExtension, runMetadataCqueryForRepo — and assert on the contract that matters: end-to-end Bzlmod and WORKSPACE-mode flows, the per-repo cquery loop, cross-workspace coordinate dedup, the timeout → re-mint loop, sidecar `manifest-status.json` shape, and `extraMavenRepoNames` threading. Pure-function `normalizeToMavenInstallJson` keeps a focused trio of unit tests (dedup, version-conflict, sha256-preservation). The fixture-driven .socket.facts.json non-emission assertion stays so the Maven-path-vs-facts-path invariant is exercised. Also patch the PyPI test mock: parseVisibleRepoCandidates moved from bazel-repo-discovery to bazel-pypi-discovery in a previous commit, so the test's vi.mock now mirrors the actual export surface. The probe fixture grows a `stderr` field to match the new RepoProbe contract.
…GNORED_DIRS `findWorkspaceRoots` no longer hardcodes the directory-prune set — callers pass `ignoreDirNames: ReadonlySet<string>` and `ignoreDirPrefixes: readonly string[]` via options. Neither defaults to anything; absent means no pruning. This keeps the walker decoupled from any particular ignore policy and avoids duplicating the codebase-wide `IGNORED_DIRS` list. `src/utils/glob.mts` exports `IGNORED_DIRS` so the orchestrator can compose it with Bazel-specific extras. The orchestrator's composed set: `IGNORED_DIRS` plus `.hg`, `.idea`, `.pnpm-store`, `.socket-auto-manifest`, `.svn`, `.vscode`; prefixes `bazel-` and `dist`. Also tighten `MAX_WALK_DEPTH` from 16 → 8. Deepest workspace marker observed across the surveyed OSS corpus is 9 (bazel-self test fixtures); deepest in realistic application code is 7 (checkmk's thirdparty layout). The cap gives one level of headroom over the realistic max while still guarding against pathological symlink loops that slipped past any prefix prune the caller supplied. Walker test rewritten against the new injected API: covers the no-prune-by-default case (`node_modules/MODULE.bazel` surfaces unless the caller ignores `node_modules`), injected name and prefix prunes, and the bazel-* symlink case under the prefix injection.
No consumer reads it today. The orchestrator still tracks per-repo timeouts to decide ExtractBazelResult.ok and to reap+remint the output_user_root, but no longer serialises the per-workspace / per-repo status report to disk.
Walker: - Lower MAX_WORKSPACE_ROOTS from 256 to 16; well above realistic monorepo counts, tighter guard against pathological inputs. Orchestrator: - Inject the workspace-walker prune policy via ExtractBazelOptions (`ignoreDirNames`, `ignoreDirPrefixes`) instead of hardcoding it inside the orchestrator. The CLI command now owns the policy and supplies the codebase-wide IGNORED_DIRS plus the Bazel-specific bits (`.socket-auto-manifest`, VCS/IDE dirs, `bazel-*` prefix). - Drop the `dist*` prefix from the prune policy — it's a JS/web convention, not Bazel-specific, and shouldn't be hardcoded. - Drop the Bzlmod-mode "defensive fallback" probe over the conventional Maven hub names. `bazel mod show_extension` is the authoritative source under Bzlmod; customer-supplied extras (`--bazel-maven-repo=`) still get probed. WORKSPACE mode keeps the probe path unchanged. The fallback had no real-world justification. - Hoist the per-workspace `buildQueryOpts` helper to module scope (eslint consistent-function-scoping). - Verbose-mode logging in `reapBazelServer` and `removeTempdir` catch blocks so the operator can see why a cleanup step failed instead of having it swallowed silently. Lint: - bazel-query-runner.mts: add missing `await` on three `return runBazelOneShot(...)` sites (@typescript-eslint/return-await). - bazel-repo-discovery.test.mts: hoist five inline `probe` closures to module-scope named consts. - extract_bazel_to_maven.test.mts: hoist `readManifest` and an inline mock factory to module scope; reorder type imports. - extract_bazel_to_pypi.mts: merge duplicate imports of `bazel-pypi-discovery.mts`. - extract_bazel_to_maven.mts: reorder type imports.
bec0ccf to
414a9a6
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 7 potential issues.
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: WORKSPACE custom hubs not found
- Added --bazel-maven-repo flag to cmd-manifest-bazel.mts (with socket.json defaults support) and wired it through as extraMavenRepoNames to both the direct CLI and auto-manifest pathways.
- ✅ Fixed: Empty show_extension skips fallback
- Changed showExtensionSucceeded to only be set true when parseShowExtensionOutput returns a non-empty hub list, so a zero-exit with no parsed hubs now correctly falls back to conventional name probing.
Or push these changes by commenting:
@cursor push 6019185b06
Preview (6019185b06)
diff --git a/src/commands/manifest/bazel/cmd-manifest-bazel.mts b/src/commands/manifest/bazel/cmd-manifest-bazel.mts
--- a/src/commands/manifest/bazel/cmd-manifest-bazel.mts
+++ b/src/commands/manifest/bazel/cmd-manifest-bazel.mts
@@ -51,6 +51,12 @@
description:
'Flags forwarded to every bazel invocation (single quoted string)',
},
+ bazelMavenRepo: {
+ type: 'string',
+ isMultiple: true,
+ description:
+ 'Extra Maven hub repo name(s) to probe; repeatable. Use for legacy WORKSPACE projects whose maven_install uses a non-conventional name.',
+ },
bazelOutputBase: {
type: 'string',
description: 'Bazel --output_base for read-only-cache CI environments',
@@ -202,7 +208,7 @@
sockJson?.defaults?.manifest?.bazel,
)
- const { ecosystem } = cli.flags
+ const { bazelMavenRepo, ecosystem } = cli.flags
let { bazel, bazelFlags, bazelOutputBase, bazelRc, out, verbose } = cli.flags
// Set defaults for any flag/arg that is not given. Check socket.json first.
@@ -260,6 +266,12 @@
}
}
+ // Compose extra Maven repo names from the CLI flag and socket.json defaults.
+ const extraMavenRepoNames: string[] = [
+ ...(Array.isArray(bazelMavenRepo) ? bazelMavenRepo : []),
+ ...(sockJson.defaults?.manifest?.bazel?.bazelMavenRepo ?? []),
+ ].filter(Boolean)
+
if (verbose) {
logger.group('- ', parentName, config.commandName, ':')
logger.group('- flags:', cli.flags)
@@ -318,6 +330,9 @@
bazelRc: bazelRc as string | undefined,
bin: bazel as string | undefined,
cwd,
+ extraMavenRepoNames: extraMavenRepoNames.length
+ ? extraMavenRepoNames
+ : undefined,
ignoreDirNames: BAZEL_WALKER_IGNORE_DIR_NAMES,
ignoreDirPrefixes: BAZEL_WALKER_IGNORE_DIR_PREFIXES,
out: out as string,
diff --git a/src/commands/manifest/bazel/extract_bazel_to_maven.mts b/src/commands/manifest/bazel/extract_bazel_to_maven.mts
--- a/src/commands/manifest/bazel/extract_bazel_to_maven.mts
+++ b/src/commands/manifest/bazel/extract_bazel_to_maven.mts
@@ -260,8 +260,12 @@
if (mode.bzlmod) {
const extResult = await runBazelModShowMavenExtension(queryOpts)
if (extResult.code === 0) {
- candidates.push(...parseShowExtensionOutput(extResult.stdout))
- showExtensionSucceeded = true
+ const parsedHubs = parseShowExtensionOutput(extResult.stdout)
+ candidates.push(...parsedHubs)
+ // Only consider successful when hubs were actually parsed; a zero
+ // exit with no recognized hub names (format mismatch, empty report)
+ // should still fall back to conventional probing.
+ showExtensionSucceeded = parsedHubs.length > 0
if (verbose) {
logger.log(
`[VERBOSE] workspace ${workspaceRoot}: show_extension yielded`,
diff --git a/src/commands/manifest/generate_auto_manifest.mts b/src/commands/manifest/generate_auto_manifest.mts
--- a/src/commands/manifest/generate_auto_manifest.mts
+++ b/src/commands/manifest/generate_auto_manifest.mts
@@ -129,6 +129,7 @@
bazelRc: bazelConfig?.bazelRc,
bin: bazelConfig?.bazel ?? bazelConfig?.bin,
cwd,
+ extraMavenRepoNames: bazelConfig?.bazelMavenRepo,
out: bazelConfig?.out ?? cwd,
outLayout: 'flat',
verbose: Boolean(bazelConfig?.verbose) || verbose,
diff --git a/src/utils/socket-json.mts b/src/utils/socket-json.mts
--- a/src/utils/socket-json.mts
+++ b/src/utils/socket-json.mts
@@ -42,6 +42,7 @@
bazel?: {
bazel?: string | undefined
bazelFlags?: string | undefined
+ bazelMavenRepo?: string[] | undefined
bazelOutputBase?: string | undefined
bazelRc?: string | undefined
bin?: string | undefinedYou can send follow-ups to the cloud agent here.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 414a9a6. Configure here.
| showExtensionSucceeded | ||
| ? extras | ||
| : [...CONVENTIONAL_MAVEN_REPO_NAMES, ...extras] | ||
| ).filter(name => !seen.has(name)) |
There was a problem hiding this comment.
WORKSPACE custom hubs not found
High Severity
Legacy WORKSPACE projects whose maven_install uses a non-conventional name are no longer discovered: candidate discovery only probes fixed hub names plus extraMavenRepoNames, and the CLI never passes that option. Repos that the old Starlark scan found (e.g. maven_legacy_app) are skipped, so extraction can report no Maven ecosystem despite a configured hub.
Reviewed by Cursor Bugbot for commit 414a9a6. Configure here.
| const ruleKind = rule.ruleClass ?? rule.rule_class ?? 'unknown' | ||
| out.push({ | ||
| deps: [], | ||
| mavenCoordinates: extracted.coord, |
There was a problem hiding this comment.
Lockfile dependency graph always empty
Medium Severity
Every artifact from metadata cquery is emitted with an empty deps list, and cquery does not request rule deps attributes. normalizeToMavenInstallJson therefore writes an empty dependencies map even when Bazel rules declare transitive Maven edges, unlike the prior probe/unsorted_deps.json path.
Reviewed by Cursor Bugbot for commit 414a9a6. Configure here.
| buildMetadataCqueryExpr(repoName), | ||
| '--output=jsonproto', | ||
| '--proto:output_rule_attrs=tags,maven_coordinates,maven_url', | ||
| '--keep_going', |
There was a problem hiding this comment.
Jar shasums no longer extracted
Medium Severity
Metadata cquery never surfaces maven_sha256 (typically in tags), so synthesized artifacts lack checksums and normalizeToMavenInstallJson writes empty shasums.jar entries. The old build-output parser populated jar digests from rule attributes and tags.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 414a9a6. Configure here.
| showExtensionSucceeded | ||
| ? extras | ||
| : [...CONVENTIONAL_MAVEN_REPO_NAMES, ...extras] | ||
| ).filter(name => !seen.has(name)) |
There was a problem hiding this comment.
Empty show_extension skips fallback
High Severity
On Bzlmod workspaces, when bazel mod show_extension exits 0 but parseShowExtensionOutput returns no hub names, discovery still treats the command as successful and only probes extraMavenRepoNames. The conventional @maven hub probe list is skipped, so Maven can be missed despite a healthy Bazel exit.
Reviewed by Cursor Bugbot for commit 414a9a6. Configure here.
| artifactCount: deduped.length, | ||
| manifestPath, | ||
| ok: true, | ||
| ok: !anyTimeout, |
There was a problem hiding this comment.
Success logged despite incomplete run
Medium Severity
After a per-repo cquery timeout, the orchestrator returns ok: false but still calls logger.success whenever any artifacts were written. Callers see a success message while the result indicates an incomplete extraction.
Reviewed by Cursor Bugbot for commit 414a9a6. Configure here.
| `attr("tags", "\\bmaven_coordinates=", ${r})`, | ||
| `attr("maven_coordinates", ".+", ${r})`, | ||
| `attr("maven_url", ".+", ${r})`, | ||
| ].join(' union ') |
There was a problem hiding this comment.
maven_url-only rules dropped
Medium Severity
The metadata cquery union includes rules that only set maven_url, but extractMavenCoordinate returns nothing without a coordinate. Those targets are analyzed and then discarded, so POM-only or coordinate-less Maven shapes never reach maven_install.json.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 414a9a6. Configure here.
| `[VERBOSE] workspace walker: hit MAX_WORKSPACE_ROOTS cap (${MAX_WORKSPACE_ROOTS}); truncating walk`, | ||
| ) | ||
| } | ||
| break |
There was a problem hiding this comment.
Workspace cap truncates silently
Medium Severity
findWorkspaceRoots stops after 16 workspace markers with only a verbose log. Nested Bazel modules beyond that cap are never scanned, so Maven hubs in omitted trees are invisible with no failure outcome.
Reviewed by Cursor Bugbot for commit 414a9a6. Configure here.



Summary
Rewrites
socket manifest bazel's Maven extraction pipeline so it (a)discovers every workspace under the scan root, not just
cwd, and (b)relies on Bazel-native commands for repo enumeration instead of static
Starlark regex parsing.
The former version was a bit over engineered, it was doing the following things wrong:
maven_install.json.After some more iteration I have landed on a better flow that also more easlily extends to other languages:
The past version was the result my (shrinking) bazel ignorance and maybe trusting the AI a little bit too much :)
Note
Medium Risk
Large change to how SBOMs are produced (many Bazel subprocesses, timeouts, partial results); mitigated by extensive tests and isolated output_user_root, but wrong hub/cquery behavior could miss or skew dependency lists.
Overview
Rewrites
socket manifest bazelMaven extraction so it scans every Bazel workspace under the repo (not onlycwd) and drives discovery/extraction through Bazel commands instead of parsing Starlark or reusingunsorted_deps.json/ cachedbazel queryprobe output.Discovery: A new workspace walker finds roots with
MODULE.bazel/WORKSPACE/WORKSPACE.bazel(caller-supplied ignore rules; CLI wiresIGNORED_DIRSplusbazel-*). Per workspace, Maven hubs come frombazel mod show_extensiononrules_jvm_external’s maven extension (Bzlmod), with conventional-name cquery probes and a populated / empty / not-defined classifier for legacy WORKSPACE mode and extras from--bazel-maven-repo=.Extraction: Each hub gets a metadata
cquery(jsonproto, union onmaven_coordinatestags/attrs/maven_url), parsed in newbazel-cquery.mts. Artifacts aggregate across workspaces and dedupe by full Maven coordinate before writing synthesizedmaven_install.json. Bazel runs use a per-invocation--output_user_root, with shutdown + tempdir cleanup and fresh roots after per-repo timeouts so hung servers don’t block the scan.Supporting changes: Maven repo discovery module shrinks to
show_extensionparsing + probe classification; query runner addsrunBazelModShowMavenExtension, shared startup flags, andbuildMavenProbeFor; PyPI keepsparseVisibleRepoCandidateslocally;ExtractedArtifact.ruleKindis a string for arbitrary rule classes from cquery.Reviewed by Cursor Bugbot for commit 414a9a6. Configure here.