Summary
When the gateway restarts and auto-resumes a previously interrupted session, the synthesized empty message carries the original source (including thread_id). If the Feishu message referenced by thread_id has been deleted/withdrawn during the downtime, _send_raw_message() uses that stale thread_id as receive_id, and the Feishu API returns [99992402] field validation failed. Both the primary send and the plain-text fallback fail with the same error, silently dropping the response.
This is a persistent, reproducible issue — not a transient API glitch. The error recurs on every gateway restart until the stale session is cleared.
Environment
| Item |
Value |
| Hermes Agent |
v0.15.1 (2026.5.29) |
| Python |
3.12.3 |
| OS |
Linux 6.8.0-111-generic (Ubuntu, x86_64) |
| lark-oapi |
1.5.3 |
| Platform |
Feishu (websocket mode) |
| Branch |
main (commit 689ef5e23) |
| Upstream remote |
https://github.com/NousResearch/hermes-agent.git |
Steps to Reproduce
- Start the gateway with Feishu connected via websocket.
- Send a message to the bot in a Feishu DM (creates a
thread_id in the session source).
- While the agent is mid-turn processing, restart the gateway (
hermes gateway restart).
- On startup,
_schedule_resume_pending_sessions() detects the interrupted session and synthesizes a MessageEvent(text="", source=<original_source>).
- If the original message referenced by
source.thread_id was deleted/withdrawn during the restart window, the response send fails.
Simplified reproduction: Delete or withdraw the Feishu message that originated the session, then restart the gateway.
Expected Behavior
The auto-resume mechanism should gracefully handle stale thread_id references. When 99992402 is returned, the adapter should fall back to sending the message at the chat level (without thread_id), similar to how it already handles 230011/231003 (reply target withdrawn/missing).
Actual Behavior
Both the primary send and the plain-text fallback fail with [99992402] field validation failed. The response is silently dropped — the user never sees it.
Logs
2026-05-30 22:39:45 gateway restart
2026-05-30 22:40:07 [Feishu] Connected in websocket mode (feishu)
2026-05-30 22:40:08 Scheduled auto-resume for 1 restart-interrupted session(s)
2026-05-30 22:40:08 inbound message: platform=feishu user=ou_xxx chat=oc_xxx msg='' ← synthetic auto-resume event
2026-05-30 22:40:21 response ready: platform=feishu chat=oc_xxx time=12.5s api_calls=1 response=270 chars
2026-05-30 22:40:21 [Feishu] Sending response (270 chars) to oc_xxx
2026-05-30 22:40:22 WARNING [Feishu] Send failed: [99992402] field validation failed — trying plain-text fallback
2026-05-30 22:40:22 ERROR [Feishu] Fallback send also failed: [99992402] field validation failed
This has been observed consistently across multiple gateway restarts since May 16. The same error pattern also affects cron job deliveries that carry a thread_id in their origin metadata pointing to a deleted message.
Root Cause Analysis
1. Auto-resume carries stale source with thread_id
_schedule_resume_pending_sessions() at gateway/run.py:3915 creates a MessageEvent with the original source object:
# gateway/run.py:3915
event = MessageEvent(
text="",
message_type=MessageType.TEXT,
source=source, # ← carries the ORIGINAL thread_id
internal=True,
)
2. _send_raw_message uses thread_id as receive_id
gateway/platforms/feishu.py:4408-4416 — when metadata.thread_id is set, the method sends with receive_id_type="thread_id":
_thread_id = (metadata or {}).get("thread_id")
if _thread_id:
body = self._build_create_message_body(
receive_id=_thread_id, # ← stale thread_id
msg_type=msg_type,
content=payload,
...
)
request = self._build_create_message_request("thread_id", body)
3. 99992402 is NOT in _FEISHU_REPLY_FALLBACK_CODES
The retry logic in _feishu_send_with_retry() (line 4568) handles codes 230011 and 231003 (reply target withdrawn/missing) but not 99992402 (field validation failed):
# gateway/platforms/feishu.py:231
_FEISHU_REPLY_FALLBACK_CODES = frozenset({230011, 231003})
So 99992402 is treated as a non-network, non-retryable error, and falls through to the plain-text fallback in base.py:3034 — which sends with the same stale metadata, producing the same error.
4. Impact scope
This is not limited to auto-resume. Any send path that carries a thread_id pointing to a deleted/withdrawn Feishu message will hit this — including cron job deliveries where the origin.thread_id is stale.
Suggested Fix
Option A: Add 99992402 to _FEISHU_REPLY_FALLBACK_CODES (minimal)
# gateway/platforms/feishu.py:231
_FEISHU_REPLY_FALLBACK_CODES = frozenset({230011, 231003, 99992402})
This would cause _feishu_send_with_retry to fall back from thread_id → chat_id routing when 99992402 is returned, which is the same fallback already used for withdrawn reply targets.
Option B: Strip thread_id from auto-resume source (targeted)
# gateway/run.py — in _schedule_resume_pending_sessions()
import copy
safe_source = copy.copy(source)
safe_source.thread_id = None # auto-resume doesn't need thread routing
event = MessageEvent(
text="",
message_type=MessageType.TEXT,
source=safe_source,
internal=True,
)
Recommendation
Option A is preferred because it handles all stale-thread_id paths (auto-resume, cron delivery, any future code path), not just auto-resume. It's also a one-line change with clear precedent in the existing fallback code. Option B could be added as a defense-in-depth layer on top.
Additional Context
- The error has been observed since at least May 16, 2026.
- Cron jobs with
origin.thread_id pointing to deleted messages produce the same error (workaround: deliver: local).
- 53 occurrences of
99992402 in the gateway log from a single session.
- The Feishu error code
99992402 means "field validation failed" — in this context, the thread_id field value fails validation because the referenced message no longer exists.
Summary
When the gateway restarts and auto-resumes a previously interrupted session, the synthesized empty message carries the original
source(includingthread_id). If the Feishu message referenced bythread_idhas been deleted/withdrawn during the downtime,_send_raw_message()uses that stalethread_idasreceive_id, and the Feishu API returns[99992402] field validation failed. Both the primary send and the plain-text fallback fail with the same error, silently dropping the response.This is a persistent, reproducible issue — not a transient API glitch. The error recurs on every gateway restart until the stale session is cleared.
Environment
main(commit689ef5e23)https://github.com/NousResearch/hermes-agent.gitSteps to Reproduce
thread_idin the session source).hermes gateway restart)._schedule_resume_pending_sessions()detects the interrupted session and synthesizes aMessageEvent(text="", source=<original_source>).source.thread_idwas deleted/withdrawn during the restart window, the response send fails.Simplified reproduction: Delete or withdraw the Feishu message that originated the session, then restart the gateway.
Expected Behavior
The auto-resume mechanism should gracefully handle stale
thread_idreferences. When99992402is returned, the adapter should fall back to sending the message at the chat level (withoutthread_id), similar to how it already handles230011/231003(reply target withdrawn/missing).Actual Behavior
Both the primary send and the plain-text fallback fail with
[99992402] field validation failed. The response is silently dropped — the user never sees it.Logs
This has been observed consistently across multiple gateway restarts since May 16. The same error pattern also affects cron job deliveries that carry a
thread_idin their origin metadata pointing to a deleted message.Root Cause Analysis
1. Auto-resume carries stale
sourcewiththread_id_schedule_resume_pending_sessions()atgateway/run.py:3915creates aMessageEventwith the originalsourceobject:2.
_send_raw_messageusesthread_idasreceive_idgateway/platforms/feishu.py:4408-4416— whenmetadata.thread_idis set, the method sends withreceive_id_type="thread_id":3.
99992402is NOT in_FEISHU_REPLY_FALLBACK_CODESThe retry logic in
_feishu_send_with_retry()(line 4568) handles codes230011and231003(reply target withdrawn/missing) but not99992402(field validation failed):So
99992402is treated as a non-network, non-retryable error, and falls through to the plain-text fallback inbase.py:3034— which sends with the same stale metadata, producing the same error.4. Impact scope
This is not limited to auto-resume. Any send path that carries a
thread_idpointing to a deleted/withdrawn Feishu message will hit this — including cron job deliveries where theorigin.thread_idis stale.Suggested Fix
Option A: Add
99992402to_FEISHU_REPLY_FALLBACK_CODES(minimal)This would cause
_feishu_send_with_retryto fall back fromthread_id→chat_idrouting when 99992402 is returned, which is the same fallback already used for withdrawn reply targets.Option B: Strip
thread_idfrom auto-resume source (targeted)Recommendation
Option A is preferred because it handles all stale-
thread_idpaths (auto-resume, cron delivery, any future code path), not just auto-resume. It's also a one-line change with clear precedent in the existing fallback code. Option B could be added as a defense-in-depth layer on top.Additional Context
origin.thread_idpointing to deleted messages produce the same error (workaround:deliver: local).99992402in the gateway log from a single session.99992402means "field validation failed" — in this context, thethread_idfield value fails validation because the referenced message no longer exists.