Skip to content

feat(workflows): add continue_on_error step field for non-halting failures#2663

Open
doquanghuy wants to merge 5 commits into
github:mainfrom
doquanghuy:feat/continue-on-error
Open

feat(workflows): add continue_on_error step field for non-halting failures#2663
doquanghuy wants to merge 5 commits into
github:mainfrom
doquanghuy:feat/continue-on-error

Conversation

@doquanghuy
Copy link
Copy Markdown
Contributor

@doquanghuy doquanghuy commented May 21, 2026

Description

Closes #2591.

Adds an optional continue_on_error: bool field on every step.
When set to true and the step fails, the engine records the
result (exit_code, stderr, status) into steps.<id>.output and
continues to the next sibling step instead of halting the run.
Downstream if, switch, or gate steps can then branch on
{{ steps.<id>.output.exit_code }} to route the recovery path.

This is the shape @mnriem proposed in the issue discussion —
it composes with primitives that already exist (the exit code
is already captured, the expression engine already resolves it,
and if/switch/gate are already available). The only gap
was that a non-zero exit hard-stopped the pipeline before any
downstream step could evaluate it.

Canonical usage

- id: heavy-thing
  type: command
  integration: claude
  command: speckit.heavy-thing
  continue_on_error: true

- id: check-result
  type: if
  condition: "{{ steps.heavy-thing.output.exit_code != 0 }}"
  then:
    - id: review
      type: gate
      message: "Step failed (exit {{ steps.heavy-thing.output.exit_code }}). Retry or skip?"
      on_reject: skip
  else:
    - id: next-thing
      command: speckit.next-thing

Engine

WorkflowEngine._execute_steps now consults the step config when
a step returns StepStatus.FAILED:

  • Gate aborts (output.aborted) always halt the run — operator
    decisions take precedence over the flag.
  • Otherwise, if continue_on_error: true, log a
    step_continue_on_error event and proceed to the next sibling.
  • Otherwise, behave as before: log step_failed, set
    RunStatus.FAILED, and return.

Exactly one event per failure-resolution path is logged so the
log timeline is unambiguous: either the run continued past the
failure or it halted.

Validation

_validate_steps rejects non-bool values for continue_on_error.
Coerced strings like "true" are not accepted so authoring
mistakes surface at validation time rather than silently
changing run semantics.

Default behaviour preserved

When continue_on_error is omitted, every code path is
byte-equivalent to before this change. Existing workflows see no
difference.

Verdict coverage (from the issue discussion)

Scenario How
Skip continue_on_error: true + if branches around the failure
Abort Omit the flag — today's default halts the run
Retry continue_on_error: true + gate → operator approves → resume re-runs from gate

Fully unattended retry-on-transient (e.g. retry a 429 at 3 AM
without operator attendance) is intentionally out of scope here.
The skip and abort verdicts work without a human; the
retry verdict still pauses for one at the gate. A future
loop/retry-count primitive or an auto-approving gate type could
close that gap on top of this mechanism without further engine
changes — happy to follow up on that in a separate issue if
useful.

Testing

  • Tested locally with uv run specify --help
  • Ran existing tests with uv sync && uv run pytest
    2967 passed, 35 skipped (was 2960 before; +7 new
    tests added in this PR).
  • Tested with a sample project: ran a 3-step workflow where
    the middle step exits non-zero. Without
    continue_on_error, run halts at the failing step (as
    before). With continue_on_error: true, the failing step
    records exit_code and the third step executes. A
    downstream if branching on
    {{ steps.flaky.output.exit_code != 0 }} routes into a
    recovery gate cleanly.

New test coverage

TestContinueOnError in tests/test_workflows.py:

Test What it locks
test_undeclared_failure_halts_run Default behaviour byte-equivalent — no flag → run halts on first non-zero exit.
test_declared_and_fired_continues_run Flag set + step fails → run continues, exit_code recorded.
test_declared_but_step_succeeded_is_noop Flag set + step succeeds → no behaviour change.
test_if_branch_routes_around_failure End-to-end recovery pattern from the issue discussion.
test_gate_abort_still_halts_with_continue_on_error Operator-driven gate abort always halts, even with the flag set.
test_validation_rejects_non_bool_continue_on_error "true" (string) fails validation.
test_validation_accepts_bool_continue_on_error true and false pass validation cleanly.

AI Disclosure

  • I did not use AI assistance for this contribution
  • I did use AI assistance (described below)

Used Claude Opus to draft the engine change, the test suite, the
docs section, and this PR body. The shape (continue_on_error

  • exit-code-as-API + branch on it via existing primitives) was
    proposed by @mnriem on the issue thread; this PR implements that
    proposal. Code, tests, and design decisions were human-reviewed
    before submission.

@doquanghuy doquanghuy requested a review from mnriem as a code owner May 21, 2026 14:34
@doquanghuy doquanghuy force-pushed the feat/continue-on-error branch 2 times, most recently from f34ab4c to da8ed4d Compare May 21, 2026 14:48
@mnriem mnriem requested a review from Copilot May 26, 2026 12:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 3

Comment thread workflows/README.md Outdated
Comment on lines +224 to +228
By default, a non-zero exit code from any step halts the entire run.
Set `continue_on_error: true` on a step to record its result and
continue to the next sibling step instead. The exit code remains
available on `steps.<id>.output.exit_code` so downstream `if`,
`switch`, or `gate` steps can branch on it:
Comment thread tests/test_workflows.py Outdated
Comment on lines +2138 to +2140
assert not any(
"continue_on_error" in e for e in errors
), errors
Comment thread tests/test_workflows.py Outdated
Comment on lines +2061 to +2062
monkeypatch.setattr(gate_module.sys.stdin, "isatty", lambda: True)
monkeypatch.setattr(
@doquanghuy
Copy link
Copy Markdown
Contributor Author

doquanghuy commented May 27, 2026

@mnriem — addressed all three Copilot findings in e9a4871, and rebased the branch onto current main (clean rebase, no conflicts).

1. README wording. Rewrote the "Error Handling" intro in terms of StepStatus.FAILED halting by default, with non-zero shell / command exit as one common cause. Avoids implying only exit codes can halt a run (gate aborts and validation failures also do, just via different mechanisms).

2. Validation test contract tightened. test_validation_accepts_bool_continue_on_error now asserts errors == [] instead of "no error mentions continue_on_error". Unrelated validation regressions on the same minimal YAML can no longer slip past this test.

3. Gate stdin patching made runner-robust. In test_gate_abort_still_halts_with_continue_on_error, swapped sys.stdin itself for a stub _TTYStdin object instead of patching sys.stdin.isatty. Method-on-instance assignment is unreliable on real io.TextIOWrapper objects (e.g. under pytest with capture disabled), so replacing the whole stdin object is more robust across runners.

Full suite still passes (continue_on_error: 7/7), no regressions. Branch is MERGEABLE and ready for another look whenever you have a moment.

AI disclosure: drafted with Claude Opus, human-reviewed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment thread workflows/README.md Outdated
Comment on lines +225 to +226
run — most commonly a `shell` or `command` step exiting non-zero, but
also things like a step type that fails validation. Set
Comment thread tests/test_workflows.py Outdated
Comment on lines +2161 to +2162
# is declared, the engine records the result (exit_code, stderr, status)
# and continues to the next sibling step instead of halting the run.
doquanghuy added a commit to doquanghuy/spec-kit that referenced this pull request May 27, 2026
- README "Error Handling": drop the validation-failure example.
  Structural validation runs before the workflow run starts, so it
  never produces a `StepStatus.FAILED` mid-run and was misleading as
  an example for `continue_on_error`. Reword to "any runtime error
  raised during step execution" with a parenthetical clarifying the
  validation-time path.
- Tests block comment: fix the misleading "(exit_code, stderr,
  status)" wording. `status` is stored as `steps.<id>.status`, not
  nested under `steps.<id>.output`. Clarify that `output` carries the
  exit-code / stderr from the failure and that `status` is a sibling
  key so future readers writing expressions against run state aren't
  led astray.

Addresses Copilot review feedback on github#2663.
@doquanghuy
Copy link
Copy Markdown
Contributor Author

@mnriem — addressed both new Copilot findings in c827f4f.

1. README runtime-vs-validation framing. Dropped the "step type that fails validation" example from the "Error Handling" intro — structural validation runs at specify workflow run startup and exits before any run is created, so it never produces a StepStatus.FAILED mid-run and was misleading as an example for continue_on_error. Reworded to "any other runtime error raised during step execution" with a parenthetical clarifying that invalid workflows are rejected up-front.

2. Test-block comment about status location. Fixed the misleading "(exit_code, stderr, status)" wording in the section header above TestContinueOnError. status is stored as steps.<id>.status (sibling of output), not nested under steps.<id>.output. Clarified that output carries the exit_code / stderr from the failure and that status is a separate sibling key, so future readers writing expressions against the run state aren't led astray.

Continue_on_error suite still 7/7, no regressions.

AI disclosure: drafted with Claude Opus, human-reviewed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 1

Comment thread src/specify_cli/workflows/engine.py Outdated
# steps can branch on it. Log a single, unambiguous
# event per failure resolution — either the run
# continued past it, or it halted.
if step_config.get("continue_on_error"):
doquanghuy added a commit to doquanghuy/spec-kit that referenced this pull request May 27, 2026
Engine was checking `continue_on_error` via truthiness
(`if step_config.get("continue_on_error"):`), which let truthy
non-bool values like the string `"true"` activate the flag at
runtime if validation was skipped. `execute()` does not auto-validate
(documented on the `load_definition` docstring), so a caller bypassing
`validate_workflow()` could see the contract silently violated.

Tighten the runtime check to identity comparison (`is True`) so only
a literal boolean enables the behaviour. Validation continues to
catch the authoring mistake at parse time; this is defense in depth
for the bypass-validation path. Add
`test_engine_ignores_truthy_non_bool_continue_on_error` locking the
contract — feeds `continue_on_error: "true"` (string) through
`execute()` directly and asserts the run halts with
`RunStatus.FAILED`.

Addresses Copilot review feedback on github#2663.
@doquanghuy
Copy link
Copy Markdown
Contributor Author

@mnriem — addressed the new Copilot finding in 838e7e4.

The gap. Copilot caught a real defense-in-depth hole: _validate_steps rejects non-bool continue_on_error values at validation time, but the engine itself was checking the flag via truthiness (if step_config.get("continue_on_error"):) at runtime. Since WorkflowEngine.execute() does NOT auto-validate the definition (the load_definition docstring explicitly says "not yet validated; call validate_workflow() or engine.validate() separately"), a caller that bypasses validation and feeds the engine a definition with continue_on_error: "true" (string) could silently activate the flag — the YAML string "true" is truthy in Python — which contradicts the documented "literal boolean only" contract.

The fix. Tightened the runtime check at engine.py:661 to identity comparison:

- if step_config.get("continue_on_error"):
+ if step_config.get("continue_on_error") is True:

Only a literal True now enables the behaviour. Validation continues to catch the authoring mistake at parse time; this is the belt-and-braces guard for the bypass-validation path. Inline comment expanded to explain why identity beats truthiness here, with a pointer to the load_definition docstring so future readers don't "simplify" it back to a truthy check.

New test. test_engine_ignores_truthy_non_bool_continue_on_error locks the contract. It bypasses validate_workflow() entirely, feeds continue_on_error: "true" (string) straight through engine.execute(), and asserts the run halts with RunStatus.FAILED and the sibling step does NOT run. Would have failed under the old truthy check.

Continue_on_error suite: 8/8 pass (was 7/7). Full suite: 3002 passed, 40 skipped, no regressions.

AI disclosure: drafted with Claude Opus, human-reviewed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment thread src/specify_cli/workflows/engine.py Outdated
# the behaviour, even if validation was skipped.
# Validation rejects non-bool values at parse time,
# but `execute()` does not auto-validate (see the
# `load_definition` docstring), so a caller passing
Comment thread tests/test_workflows.py Outdated
Comment on lines +2418 to +2419
behaviour. `execute()` does not auto-validate (the
`load_definition` docstring documents this), so the engine
Copy link
Copy Markdown
Collaborator

@mnriem mnriem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address Copilot feedback and resolve conflicts

Adds an optional `continue_on_error: bool` field on every step.
When set to `true` and the step fails, the engine records the
result (`exit_code`, `stderr` on `steps.<id>.output` plus `status`
as a sibling key on `steps.<id>`) and continues to the next sibling
step instead of halting the run. Downstream `if`, `switch`, or
`gate` steps can then branch on
`{{ steps.<id>.output.exit_code }}` to route the recovery path.

Engine details
--------------
`WorkflowEngine._execute_steps` now consults the step config when a
step returns `StepStatus.FAILED`:

- Gate aborts (`output.aborted`) always halt the run — operator
  decisions take precedence over the flag.
- Otherwise, if `continue_on_error` is the literal `True`, log a
  `step_continue_on_error` event and proceed to the next sibling.
  The runtime check uses identity comparison (`is True`) rather
  than truthiness, so truthy non-bool values like the string
  `"true"` cannot silently change run semantics even if a caller
  bypasses `validate_workflow()`.
- Otherwise, behave as before: log `step_failed`, set
  `RunStatus.FAILED`, and return.

Validation
----------
`_validate_steps` rejects non-bool values for `continue_on_error`.
Coerced strings like `"true"` are not accepted so authoring
mistakes surface at validation time rather than silently changing
run semantics.

Tests
-----
`TestContinueOnError` in `tests/test_workflows.py` (8 tests):
- `test_undeclared_failure_halts_run` — default halt behaviour.
- `test_declared_and_fired_continues_run` — flag + fail → continue.
- `test_declared_but_step_succeeded_is_noop` — flag + success → no-op.
- `test_if_branch_routes_around_failure` — end-to-end recovery.
- `test_gate_abort_still_halts_with_continue_on_error` — abort
  always halts.
- `test_validation_rejects_non_bool_continue_on_error` — `"true"`
  rejected at validation.
- `test_validation_accepts_bool_continue_on_error` — `true`/`false`
  pass cleanly.
- `test_engine_ignores_truthy_non_bool_continue_on_error` —
  defense-in-depth: engine ignores string `"true"` even when
  validation is bypassed.

Rebased onto current upstream/main (post github#2664 merge); the new
`TestContinueOnError` class sits immediately after upstream's
`TestContextRunId` so the two feature suites coexist cleanly.

Closes github#2591.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@doquanghuy doquanghuy force-pushed the feat/continue-on-error branch from 838e7e4 to d0b9e00 Compare May 27, 2026 18:11
@doquanghuy
Copy link
Copy Markdown
Contributor Author

@mnriem — addressed both Copilot findings and resolved the conflicts in d0b9e00.

1. Wrong method name reference. Both the engine inline comment (engine.py:667) and the test docstring referenced load_definition, but the actual engine API is WorkflowEngine.load_workflow() / WorkflowEngine.execute() — there is no load_definition. Fixed both to reference WorkflowEngine.load_workflow and quote the exact docstring line ("not yet validated; call validate_workflow() or engine.validate() separately") so the pointer is self-checking.

2. Conflict resolution after PR #2664 landed. Upstream main advanced past my branch — notably #2664 merged in c6afe4c, which inserted a new TestContextRunId class at the same anchor point (just before # ===== State Persistence Tests =====) where my TestContinueOnError was sitting. Both classes are independent feature suites and need to coexist.

I rebased the branch onto current main (c6afe4c) and squashed my four iteration commits into a single feat(workflows): add continue_on_error step field commit on top — cleaner history, and it sidesteps the messy three-way conflict that came from re-applying each iteration commit through the same conflict region. New class ordering in tests/test_workflows.py:

TestContextRunId        (upstream, from #2664)
TestContinueOnError     (this PR, 8 tests)
TestRunState            (upstream, unchanged)

Full suite: 3013 passed (was 3009 pre-rebase; +4 from upstream's TestContextRunId + same 8 from TestContinueOnError), 40 skipped, no regressions. Branch is MERGEABLE again.

AI disclosure: drafted with Claude Opus, human-reviewed.

@mnriem mnriem requested a review from Copilot May 28, 2026 12:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment thread workflows/README.md
Comment on lines 285 to 289
Supported filters: `default`, `join`, `contains`, `map`.

### Runtime Context

`{{ context.* }}` exposes engine-managed runtime metadata for the
current run:

| Variable | Description |
|----------|-------------|
| `context.run_id` | The current workflow run id (the same value Spec Kit prints as `Run ID:` at the end of `workflow run`). Auto-generated runs are 8-character hex from `uuid4`; operator-supplied ids may be any alphanumeric string with hyphens or underscores. Empty string outside a run context. |

```yaml
# Stamp telemetry events with the run id for cross-system join.
- id: emit-event
type: shell
run: 'echo "{\"run_id\":\"{{ context.run_id }}\",\"event\":\"started\"}" >> events.jsonl'

# Per-run scratch directory.
- id: prep-scratch
type: shell
run: 'mkdir -p /tmp/run-{{ context.run_id }}'

# Pass run id into a command for artifact metadata.
- id: tag-artifact
command: speckit.specify
input:
args: "{{ context.run_id }}"
```

## Input Types

Workflow inputs are type-checked and coerced from CLI string values:
Comment thread workflows/README.md Outdated
then:
- id: review
type: gate
message: "Step failed (exit {{ steps.heavy-thing.output.exit_code }}). Retry or skip?"
@mnriem
Copy link
Copy Markdown
Collaborator

mnriem commented May 28, 2026

Please address Copilot feedback

Two Copilot findings on d0b9e00:

1. The `### Runtime Context` documentation for `{{ context.* }}` was
   lost during the rebase onto current main (the squash dropped the
   anchor where github#2664 had added it). Restored under `## Expressions`
   so users can find `context.run_id` semantics and examples.

2. The continue_on_error example gate had message "Retry or skip?"
   but used the default `options: [approve, reject]` with `on_reject:
   skip`, which implied an automatic retry path that gates do not
   provide. Reworded the message to match the actual approve/reject
   semantics and added an explicit note that retry requires either
   custom gate options + downstream branching or a wrapper loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@doquanghuy
Copy link
Copy Markdown
Contributor Author

@mnriem — addressed both Copilot findings in 22c5184 (docs-only, no engine/test changes).

1. Runtime Context section restored. The ### Runtime Context block for {{ context.* }} (added upstream in #2664) was lost during my rebase onto current main — the squash dropped the anchor where it sat. Restored under ## Expressions so context.run_id semantics and the three usage examples are discoverable again.

2. Gate prompt clarified. Reworded the example's message: from "Retry or skip?" to "Approve to continue, or reject to skip the rest of this branch." — the default options: [approve, reject] never offered a retry path, and gates do not auto re-run the failed step. Added an explicit note below the example clarifying that retry requires either custom gate options with downstream branching, or a wrapper loop.

CI was green on the prior commit (11/11 passing). This change is README-only — no Python tests touched.

AI disclosure: drafted with Claude Opus, human-reviewed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 1

Comment thread workflows/README.md Outdated
Comment on lines +224 to +232
By default, any step that ends in `StepStatus.FAILED` at runtime halts
the entire run — most commonly a `shell` or `command` step exiting
non-zero, but also any other runtime error raised during step
execution. (Invalid workflow definitions are rejected up-front by
`specify workflow run` before the run even starts, so structural
validation failures never reach this code path.) Set
`continue_on_error: true` on a step to record its result and continue
to the next sibling step instead. When the failure was a non-zero
exit, the exit code remains available on
Copilot finding on d0b9e00:

The README's "Error Handling" intro implied `continue_on_error` covers
"any other runtime error raised during step execution", but the engine
only consults the flag when a step returns `StepResult(status=FAILED, ...)`.
Exceptions raised out of `step_impl.execute()` propagate to
`WorkflowEngine.execute()`, where the catch-all logs `workflow_failed`
and re-raises — the step result is never recorded, and the flag is
never consulted.

Audited the whole PR diff for the same overclaim:

1. workflows/README.md — main fix. Reworded the Error Handling intro to
   "any step that returns StepResult(status=FAILED, ...)" and promoted
   the parenthetical structural-validation note into the Notes block.
   Added a new "Scope: returned failures only" note that names the
   exception path explicitly and tells step authors how to bring the
   flag into scope for exceptional code (catch internally and return
   FAILED with the failure encoded in `output`).

2. tests/test_workflows.py — section comment used "when an executable
   step fails", same ambiguity. Tightened to "when a step returns
   StepResult(status=FAILED, ...)" and added a sentence calling out
   that unhandled exceptions are out of scope.

3. src/specify_cli/workflows/engine.py — already correct ("any step
   that returns FAILED" in the validator comment; "lets the pipeline
   route around the failure" in the execute path). No change.

Engine semantics and test bodies are unchanged. Docs-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@doquanghuy
Copy link
Copy Markdown
Contributor Author

@mnriem — addressed the new Copilot finding in b8982a7 (docs-only, no engine/test code changes).

The finding. Copilot caught a real overclaim in the README "Error Handling" intro: the text implied continue_on_error covers "any other runtime error raised during step execution", but the engine only consults the flag when a step returns StepResult(status=FAILED, ...). Exceptions raised out of step_impl.execute() propagate to WorkflowEngine.execute(), where the catch-all logs workflow_failed and re-raises — the step result is never recorded, and the flag is never consulted.

Audit (per "fix all similar issues"). Grepped the entire PR diff for the same pattern:

  1. workflows/README.md — main fix.

    • Reworded the Error Handling intro to "any step that returns StepResult(status=FAILED, ...)" (removed the "any other runtime error raised during step execution" clause).
    • Added a new "Scope: returned failures only" bullet in the Notes block that names the exception path explicitly: unhandled exceptions are caught by WorkflowEngine.execute(), logged as workflow_failed, and abort the run regardless of continue_on_error. Tells step authors how to bring the flag into scope for exceptional paths — catch the exception internally and return StepResult(status=FAILED, ...) with the failure encoded in output.
    • Promoted the parenthetical structural-validation note ("Invalid workflow definitions are rejected up-front...") out of the main paragraph into the Notes block for parallelism with the other "what the flag does NOT cover" notes.
  2. tests/test_workflows.py — the ===== continue_on_error Tests ===== section header read "when an executable step fails and continue_on_error: true is declared", same ambiguity (could be misread to include exception paths). Tightened to "when a step returns StepResult(status=FAILED, ...)" and added a sentence calling out that unhandled exceptions are out of scope.

  3. src/specify_cli/workflows/engine.py — audited the inline comments added in d0b9e00: the validator block already says "any step that returns FAILED" (line 235) and the execute path says "lets the pipeline route around the failure" (line 654). Both correctly use returned, not raised. No code-side change needed.

Engine semantics, validation, and test bodies are unchanged. CI: 161/161 tests/test_workflows.py pass locally.

Branch is clean against upstream/main. Ready when you are.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 1

Comment thread workflows/README.md Outdated
Comment on lines +252 to +256
The gate's default `options: [approve, reject]` map directly to "continue
the run" vs. "skip the rest of this branch" — gates do not automatically
re-run the failed step. To express a retry path, either define custom
gate options and branch on the choice downstream, or wrap the failing
step in your own loop.
…MPLETED, not auto-skip

Copilot finding on b8982a7:

The README example's gate message said "reject to skip the rest of this
branch", and the explanatory paragraph claimed [approve, reject] map
to "continue" vs "skip the rest of this branch". The engine does not
implement automatic branch-skipping. `on_reject: skip` returns
`StepStatus.COMPLETED` (gate/__init__.py:65-66); the next sibling step
runs unconditionally unless the author wires a downstream `if` reading
`{{ steps.<gate-id>.output.choice }}`.

Two fixes:

1. Restructured the YAML example so it actually demonstrates the
   manual-branching pattern: added a `recover` if-step after the gate
   that conditions on `steps.review.output.choice == 'approve'`. Now
   the example shows the real workflow author's responsibility instead
   of implying the engine does it.

2. Replaced the trailing paragraph with three precise notes:
   - both gate options return COMPLETED; `on_reject: skip` controls
     abort behaviour only, not sibling-skipping
   - all three `on_reject` values enumerated with their actual engine
     semantics (FAILED+aborted / COMPLETED / PAUSED)
   - the original retry-loop guidance retained as the third bullet

Updated the gate message in the example to match — "reject to leave the
failure recorded and move on" instead of "reject to skip the rest of
this branch".

Audited the whole PR diff for the same overclaim: no other instance.
Engine semantics, validation, and test bodies are unchanged. Docs-only.

161/161 tests/test_workflows.py pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@doquanghuy
Copy link
Copy Markdown
Contributor Author

@mnriem — addressed the new Copilot finding in $(git -C /tmp/spec-kit-pr-2663 rev-parse --short HEAD) (docs-only, no engine/test code changes).

The finding. Copilot caught a real overclaim in the example: the gate message said "reject to skip the rest of this branch" and the explanatory paragraph claimed [approve, reject] map to "continue" vs "skip the rest of this branch". The engine does not implement automatic branch-skipping. Per gate/__init__.py:65-66, on_reject: skip returns StepStatus.COMPLETED; the next sibling step in the then: list runs unconditionally unless the workflow author wires a downstream if reading {{ steps.<gate-id>.output.choice }}.

Two fixes:

  1. Restructured the YAML example to actually demonstrate the manual-branching pattern. Added a recover if-step after the gate that conditions on steps.review.output.choice == 'approve'. Now the example shows what the workflow author has to do, instead of implying the engine does it for them.

  2. Replaced the trailing paragraph with three precise notes:

    • both gate options return COMPLETED; on_reject: skip controls abort behaviour only, not sibling-skipping
    • all three on_reject values enumerated with their actual engine semantics (abortFAILED+output.aborted, skipCOMPLETED, retryPAUSED)
    • the original retry-loop guidance retained as the third bullet

Updated the example gate message to match — "reject to leave the failure recorded and move on" instead of the old overclaim.

Audit (per "fix all similar issues"). Grepped the entire PR diff for the same pattern. No other instance — tests don't carry similar phrasing, and engine.py comments accurately describe the gate semantics already.

Engine semantics, validation, and test bodies are unchanged. CI: 161/161 tests/test_workflows.py pass locally.

…ally branch

Audit follow-up to 393ac6b — three sites repeated the same minor
overclaim about gates being one of the "branch on it" step types
alongside `if` and `switch`:

1. workflows/README.md (the "downstream `if`, `switch`, or `gate`
   steps can branch on it" sentence introducing the example)
2. engine.py:236 (validator inline comment)
3. engine.py:657 (execute-path inline comment)

A `gate` step does not have a `condition` or `expression` field — it
only evaluates expressions for `message` and `show_file` (gate/__init__.py:29,36).
Programmatic branching happens in `if`/`switch`; a gate surfaces the
value to a human operator via message interpolation, and the operator's
choice is recorded in `output.choice` for a *subsequent* `if`/`switch`
to route on.

Reworded all three sites consistently: "a downstream `if` or `switch`
can branch on it (or a `gate` can surface it to the operator via
message interpolation)". The README example already demonstrates this
distinction — the gate carries `{{ }}` template variables in its
message and the `recover` if-step downstream is what actually branches
on the choice.

Engine semantics, validation, and test bodies are unchanged. Docs-only
on the README; comment-only on engine.py.

161/161 tests/test_workflows.py pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@doquanghuy
Copy link
Copy Markdown
Contributor Author

@mnriem — proactive audit follow-up in 03454dd (docs + inline-comment only).

After fixing the Copilot finding in 393ac6b I audited the rest of the PR diff for the same class of issue (docs claiming engine behaviour that the code doesn't actually implement). Found three sites with the same minor overclaim:

  1. workflows/README.md — intro sentence to the example said "downstream if, switch, or gate steps can branch on it".
  2. src/specify_cli/workflows/engine.py:236 — validator inline comment said the same.
  3. src/specify_cli/workflows/engine.py:657 — execute-path inline comment said the same.

A gate step doesn't have a condition or expression field. Looking at gate/__init__.py:29,36, the only expressions a gate evaluates are template interpolations in message and show_file for human display. Programmatic branching is if/switch only; a gate's role is to surface a value to the operator, and the operator's output.choice is what a subsequent if/switch then routes on.

Reworded all three sites consistently to: "a downstream if or switch can branch on it (or a gate can surface it to the operator via message interpolation)". The README example already demonstrates this distinction — the gate carries {{ }} template variables in its message, and the recover if-step downstream is what actually branches on the choice.

Engine semantics, validation, and test bodies are unchanged. CI: 161/161 tests/test_workflows.py pass locally.

Branch is clean against upstream/main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Step-level on_failure: hook for recovery gates

3 participants