Skip to content

[Bug]: docker_forward_env forwards empty-string secrets into sandbox, bypassing the ~/.hermes/.env disk fallback #35580

@SiTaggart

Description

@SiTaggart

Bug Description

Summary

When a docker_forward_env key is present-but-empty ("") in the gateway's live process env, DockerEnvironment forwards the empty value into the container instead of falling back to the value on disk in ~/.hermes/.env. The disk fallback only triggers when the key is unset (None), not when it's an empty string. Result: a momentary blank in os.environ silently propagates a broken secret into every sandbox exec spawned during that window, even though the correct value is sitting in .env the whole time.

Affected code

tools/environments/docker.py, the forward-env build loop (~L815-824):

for key in sorted(forward_keys):
    value = os.getenv(key)
    if value is None:                 # only catches UNSET, not ""
        value = hermes_env.get(key)   # ~/.hermes/.env disk fallback
    if value is not None:             # forwards "" happily
        exec_env[key] = value

hermes_env (from _load_hermes_env_vars()) is exactly the durable safety net that should heal a transient blank — but "" is None is False, so the fallback is skipped, and "" is not None is True, so the empty value is forwarded as -e KEY=.

Impact

Intermittent, near-invisible auth failures for any secret consumed inside the sandbox. In our deployment LINEAR_API_KEY is forwarded and read by sandboxed cron jobs; we observed two incidents ~9 days apart where Linear API calls 401'd ("Authentication required, not authenticated") and a debug probe printed LINEAR_API_KEY length: 0, while ~/.hermes/.env held the correct 48-char key the entire time and no gateway restart or .env rewrite occurred. Other forwarded secrets (Slack/Telegram tokens) never surfaced the issue because they're consumed gateway-side and never ride a docker exec -e into a container — so the empty-forward is only observable for secrets a sandboxed workload actually uses.

Steps to Reproduce

Standalone, models the exact branch logic:

def build_args(forward_keys, live_env, disk_env):
    exec_env = {}
    hermes_env = disk_env if forward_keys else {}
    for key in sorted(forward_keys):
        value = live_env.get(key, None)      # os.getenv(key)
        if value is None:                    # only unset
            value = hermes_env.get(key)
        if value is not None:                # forwards ""
            exec_env[key] = value
    args = []
    for key in sorted(exec_env):
        args.extend(["-e", f"{key}={exec_env[key]}"])
    return args

forward = {"LINEAR_API_KEY"}
live = {"LINEAR_API_KEY": ""}                # transient empty in os.environ
disk = {"LINEAR_API_KEY": "x" * 48}          # ~/.hermes/.env still correct

flag = next(a for a in build_args(forward, live, disk) if a.startswith("LINEAR_API_KEY="))
print(flag, "-> length:", len(flag.split("=", 1)[1]))

Expected Behavior

LINEAR_API_KEY=xxxx... -> length: 48

Actual Behavior

LINEAR_API_KEY= -> length: 0 (disk fallback skipped)

Affected Component

Configuration (config.yaml, .env, hermes setup)

Messaging Platform (if gateway-related)

No response

Debug Report

N/A

Operating System

Ubuntu 24.04.4 LTS

Python Version

3.11.15

Hermes Version

v0.15.1 (2026.5.29)

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

Proposed fix

Treat empty-string the same as unset, and never forward a blank:

for key in sorted(forward_keys):
    value = os.getenv(key)
    if not value:                 # unset OR empty -> consult ~/.hermes/.env
        value = hermes_env.get(key)
    if value:                     # never forward a blank secret
        exec_env[key] = value

This makes the existing disk fallback actually cover the empty case, and is robust to whatever produces the transient blank (env mutation during concurrent spawns, partial reload, etc.). Behavior change is limited to keys that are legitimately empty-by-design, which shouldn't apply to forwarded secrets; if strict backward-compat is preferred, the minimal change is just if value is None: → if not value: on the fallback line, leaving the forward guard as-is.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundarea/authAuthentication, OAuth, credential poolsbackend/dockerDocker container executiontype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions