feat(githubapp): pair PEM keys with App IDs across chunks#4980
Conversation
…n seperate chunks of the same source can be paired and verified together
|
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 73b9ad6. Configure here.
| func (s Scanner) Keywords() []string { | ||
| return []string{"github"} | ||
| func (s *Scanner) Keywords() []string { | ||
| return []string{"github", "private key"} |
There was a problem hiding this comment.
Keywords don't cover all appPat regex alternatives
Medium Severity
The appPat regex matches three alternatives: github[-_ ]?app[-_ ]?id, gh[-_ ]?app[-_ ]?id, and app[-_ ]?id. However, Keywords() only returns ["github", "private key"]. The Aho-Corasick pre-filter requires at least one keyword to appear in a chunk before it reaches the detector. Chunks containing only gh_app_id or app_id (without "github" or "private key" elsewhere) will never pass the pre-filter, so the detector is never invoked on them. This breaks cross-chunk pairing for those patterns since the app ID half is silently dropped.
Reviewed by Cursor Bugbot for commit 73b9ad6. Configure here.
Corpora Test ResultsScans a corpus of real-world public code against only the detectors changed in this PR, then compares unique match counts between the PR build and the main baseline to catch regex regressions. Verification is disabled — each detector's regex is measured independently. 0 new · 1 clean | Scoped to:
|


Description:
pkg/detectors/githubappso a PEM private key in one chunk and an App ID in another chunk (same source) can be paired and verified, instead of requiring both halves to live in the same chunk.sync.Map[SourceID]*sourceState, where eachsourceStateholds twohashicorp/golang-lru/v2caches (PEMs, App IDs) capped at 256 entries each. Entries TTL out after 30m; a best-effort reaper runs at most every 5m.keyPatnow matches anyBEGIN/END … PRIVATE KEYblock (not justRSA);appPatmatchesgithub_app_id,gh-app-id,app id, etc. with 4–9 digit IDs.MaxSecretSizeProvider(4096),MultiPartCredentialProvider(4096), andCustomFalsePositiveChecker.Keywords()now returns{"github", "private key"}.hashicorp/golang-lru/v2is already ingo.mod.Checklist:
make test-community)?make lintthis requires golangci-lint)?Note
Medium Risk
Changes core secret-matching and adds per-source in-memory pairing plus live GitHub API verification; broader regexes may shift false-positive/negative behavior until exercised in production scans.
Overview
Reworks the GitHub App detector so a 2048-bit RSA PKCS#1 PEM and an App ID no longer have to appear in the same scan chunk. When
chunk_source_idis present, each source keeps bounded LRU caches of “half” credentials and pairs new chunks with prior halves (withcompanion_locationandpairingmetadata); without a source ID, only in-chunk pairs are emitted.Detection is tightened and broadened at once: PEM blocks match generic
BEGIN/END … PRIVATE KEYtext but are accepted only after a GitHub-app-shaped key check (no encrypted PEM headers, fixed RSA parameters); App IDs match more config-style labels (github_app_id,gh-app-id, etc.) with 4–9 digits. Results now useRawV2, structuredSecretParts, and optional verification that calls GitHub’s/appAPI and records app/owner/permission fields on success.The scanner becomes stateful (
sync.Map+ TTL reaping), advertises 4096-byte max secret/credential span, addsprivate keyas a keyword, and unit/integration tests are updated for YAML-style fixtures and cross-chunk behavior.Reviewed by Cursor Bugbot for commit 73b9ad6. Bugbot is set up for automated code reviews on this repo. Configure here.