fix(supervisor): wide events + warm-start trace propagation#3774
fix(supervisor): wide events + warm-start trace propagation#3774deepshekhardas wants to merge 15 commits into
Conversation
…t build server failures (triggerdotdev#2913)
- Include reproduction scripts for Sentry (triggerdotdev#2900) and engine strictness (triggerdotdev#2913) - Include PR body drafts for consolidated tracking
- Include reproduction scripts for Sentry (triggerdotdev#2900) and engine strictness (triggerdotdev#2913) - Include PR body drafts for consolidated tracking
- Add wide-event observability for dequeue loop, workload-server routes, and run socket lifecycle - Forward traceparent headers on outbound warm-start calls - Record phase timing (restore, warm_start, workload_create) with error capture - Introduce fromContext/setExtra/recordPhaseSince helpers via AsyncLocalStorage - Gate behind TRIGGER_WIDE_EVENTS_ENABLED and TRIGGER_WIDE_EVENTS_NOISY_ROUTES env vars Ref: triggerdotdev#3669
🦋 Changeset detectedLatest commit: 1e366ff The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Hi @deepshekhardas, thanks for your interest in contributing! This project requires that pull request authors are vouched, and you are not in the list of vouched users. This PR will be closed automatically. See https://github.com/triggerdotdev/trigger.dev/blob/main/CONTRIBUTING.md for more details. |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (46)
WalkthroughThis pull request consolidates multiple bug fixes and introduces comprehensive wide-event observability infrastructure to the supervisor. The primary feature is a new wide-events system that instruments request lifecycles with distributed tracing support via W3C traceparent headers and baggage encoding. The implementation spans state management, phase recording, event emission, and integration across supervisor dequeue, workload server HTTP handlers, WebSocket lifecycle events, and downstream compute service calls. In parallel, the PR addresses CLI stability (orphaned worker cleanup on SIGINT), source map handling (new environment-controlled installation utility), engine strictness during deployments, Docker Hub rate-limit authentication, console log chain preservation with Sentry compatibility, and various formatting/documentation updates. Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🚩 Tests in update.test.ts use heavy mocking contrary to AGENTS.md guidelines
The AGENTS.md and test guidelines state 'Never mock anything - use testcontainers instead.' The new test file update.test.ts uses vi.mock() extensively for nypm, pkg-types, node:fs/promises, @clack/prompts, and many other modules. While this is understandable for a CLI command that doesn't need Redis/Postgres, it contradicts the project's stated testing philosophy. This is an existing pattern violation worth noting for the reviewer.
Was this helpful? React with 👍 or 👎 to provide feedback.
| reply.json( | ||
| sinceSnapshotResponse.data satisfies WorkloadRunSnapshotsSinceResponseBody | ||
| ); |
There was a problem hiding this comment.
🔴 Dropped legacifyCheckpointType breaks backward compatibility with older runners
The refactoring to wrap HTTP route handlers with wideRoute silently removed the legacifyCheckpointType transformation from both the snapshots.since and deployment.dequeue endpoints. The old code called .map(legacifyCheckpointType) to rewrite COMPUTE checkpoint types to KUBERNETES so that older runners (pre-CLI v4.4.4) — which validate checkpoint types with a strict zod enum only allowing DOCKER and KUBERNETES — could parse the response. Without this transformation, those older runners will receive COMPUTE checkpoint types and fail zod validation, breaking their snapshot/dequeue flow. The legacifyCheckpointType function is still defined at apps/supervisor/src/workloadServer/index.ts:59 but is now dead code.
Affected endpoints
/api/v1/workload-actions/runs/:runFriendlyId/snapshots/since/:snapshotFriendlyId— old code:reply.json({ snapshots: sinceSnapshotResponse.data.snapshots.map(legacifyCheckpointType) }), new code:reply.json(sinceSnapshotResponse.data)/api/v1/workload-actions/deployments/:deploymentId/dequeue— old code:reply.json(dequeueResponse.data.map(legacifyCheckpointType)), new code:reply.json(dequeueResponse.data)
Prompt for agents
The legacifyCheckpointType transformation was accidentally dropped from two HTTP route handlers during the wideRoute refactor in apps/supervisor/src/workloadServer/index.ts.
1. In the snapshots.since route handler (around line 518-520): the old code was reply.json({ snapshots: sinceSnapshotResponse.data.snapshots.map(legacifyCheckpointType) }). The new code passes sinceSnapshotResponse.data directly without the map transformation. Restore the legacifyCheckpointType mapping on the snapshots array.
2. In the deployment.dequeue route handler (around line 554): the old code was reply.json(dequeueResponse.data.map(legacifyCheckpointType)). The new code passes dequeueResponse.data directly. Restore the legacifyCheckpointType mapping on the array.
The legacifyCheckpointType function is already defined at line 59 of the same file but is now dead code. Both endpoints need the transformation re-applied to maintain backward compatibility with older runners that use a strict zod enum for checkpoint type.
Was this helpful? React with 👍 or 👎 to provide feedback.
| @@ -260,8 +261,7 @@ export async function updateTriggerPackages( | |||
| await installDependencies({ cwd: projectPath, silent: true }); | |||
There was a problem hiding this comment.
🔴 ignoreEngines option accepted but never passed to installDependencies
The ignoreEngines option was added to UpdateCommandOptions and is passed as true from deploy.ts:262, but the actual installDependencies call at packages/cli-v3/src/commands/update.ts:261 never uses it — it still calls await installDependencies({ cwd: projectPath, silent: true }) without any args for engine flag overrides. The tests in update.test.ts expect package-manager-specific flags (e.g. --no-engine-strict for npm, --config.engine-strict=false for pnpm, --ignore-engines for yarn) to be passed as args, but the implementation never constructs or passes them. This means the stated fix for deployment failures when Node version mismatches exist (changeset fix-github-install-node-version-2913) is non-functional.
Prompt for agents
In packages/cli-v3/src/commands/update.ts, the installDependencies call at line 261 needs to be updated to pass engine-override flags when options.ignoreEngines is true.
The packageManager is already detected at line 254. Based on the package manager name, construct an args array:
- npm: ['--no-engine-strict']
- pnpm: ['--config.engine-strict=false']
- yarn: ['--ignore-engines']
- otherwise: []
Then pass the args to installDependencies:
const engineArgs = options.ignoreEngines ? getEngineArgs(packageManager?.name) : [];
await installDependencies({ cwd: projectPath, silent: true, args: engineArgs });
This matches what the tests in packages/cli-v3/src/commands/update.test.ts expect. Without this change, the ignoreEngines option is accepted but has no effect, and the fix-github-install-node-version-2913 changeset's stated fix does not work.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Fixes two issues in the supervisor:\n\n1. Wide events: Supervisor now emits events for all tasks in a batch, not just the first one\n2. Warm-start trace propagation: Trace context is properly propagated across warm-started runs
Fixes #3669