Skip to content

ENT-11586: Move build host provisioning to testing-pr (fix-buildhost.sh)#2283

Open
nickanderson wants to merge 1 commit into
cfengine:masterfrom
nickanderson:ENT-11586/move-provisioning-to-fix-buildhost
Open

ENT-11586: Move build host provisioning to testing-pr (fix-buildhost.sh)#2283
nickanderson wants to merge 1 commit into
cfengine:masterfrom
nickanderson:ENT-11586/move-provisioning-to-fix-buildhost

Conversation

@nickanderson
Copy link
Copy Markdown
Member

@nickanderson nickanderson commented Jun 1, 2026

Right now most build-host provisioning happens in the EC2 plugin's Init Script. The trouble is that when something fails there, you don't see it in the build console and the instance can sit around provisioned but unusable, which is where the spawn storms come from.

This moves the heavy part (setup-cfengine-build-host.sh) into ci/fix-buildhost.sh, which testing-pr already sources on every build. Now provisioning runs inside the job, so the output lands in the build log and a failure actually fails the build instead of leaking an instance.

What changed:

  • ci/fix-buildhost.sh runs setup-cfengine-build-host.sh, but only once. It checks for a /etc/cfengine-build-host-provisioned marker so persistent hosts don't re-provision on every job, and so we don't double-provision during the rollout while the Init Script is still doing it too. FORCE_FIX_BUILDHOST=1 re-runs it, SKIP_FIX_BUILDHOST=1 skips it for hosts that are set up some other way. Exotic and non-Linux hosts are skipped. It runs under the existing set -e, so a failure stops the build.
  • ci/setup-cfengine-build-host.sh writes that marker once it finishes. Best effort, it won't fail the build if the write doesn't work.

Both are additive, nothing already on master changes behavior.

There's a companion change in NorthernTechHQ/infra that drops the setup-cfengine-build-host.sh call from the Init Script (default template and mingw) and wires fix-buildhost.sh into e2e-deployment-tests, which is the one job that runs straight on a PACKAGES_* host without sourcing it.

Merge order matters: this one goes first. While the Init Script is still provisioning, the marker makes the in-job call a no-op, so nothing gets provisioned twice. The infra change to stop provisioning in the Init Script follows after.

shellcheck and bash -n are clean on both files (the only shellcheck notes left are pre-existing ones in code this PR doesn't touch). I also checked that sourcing it can't kill the job shell and that the skip / already-provisioned / provision branches behave.

With
https://github.com/NorthernTechHQ/infra/pull/1091

Copy link
Copy Markdown
Contributor

@craigcomstock craigcomstock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth a try

Comment thread ci/fix-buildhost.sh Outdated
# FORCE_FIX_BUILDHOST=1 re-provision even if the marker is present
# SKIP_FIX_BUILDHOST=1 skip provisioning entirely (e.g. hosts provisioned
# out-of-band by CFEngine policy)
FIX_BUILDHOST_MARKER=/etc/cfengine-build-host-provisioned
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should implement this. Part of the advantage of running the build host setup everytime is that changes can be made to he buildhost in a PR and then reverted to normal in say a nightly build.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, agreed. Dropped the marker/skip logic so setup runs on every build now, which keeps the PR-changes-then-revert behavior you describe. setup-cfengine-build-host.sh is back to untouched master, the only change left is the call in fix-buildhost.sh.

Ticket: ENT-11586
Changelog: None
@nickanderson nickanderson force-pushed the ENT-11586/move-provisioning-to-fix-buildhost branch from 07cd1a9 to a319e85 Compare June 1, 2026 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants