Summary
claude-mem v10.6.2 is in a state where the worker accepts queued messages from hooks but fails every single message on the first attempt with zero retries. The queue has grown to ~29,400 failed rows; session_summaries and observations tables have been frozen since 2026-04-17 despite hooks firing on every session daily.
Evidence
sqlite3 ~/.claude-mem/claude-mem.db \
"SELECT status, message_type, COUNT(*), MAX(retry_count) AS max_retries
FROM pending_messages GROUP BY status, message_type"
failed | observation | 22490 | 0
failed | summarize | 6907 | 0
pending| summarize | 46 | 0
Date range of failures (oldest → newest):
2026-03-26 20:01:17 | failed | summarize | retry_count=0
2026-05-30 12:03:11 | failed | observation | retry_count=0
Failures span the entire install window — every message ever queued fails. retry_count=0 for ALL of them means whatever exception the worker hits, it's not in the retry path, and the message goes straight to failed.
For comparison, the other tables are healthy:
user_prompts: 10,417 rows, current through today (every user prompt captured fine)
session_summaries: 307 rows, frozen at 2026-04-17T07:26 (when the rich-summary write last succeeded)
observations: 512 rows, frozen at 2026-04-17T07:21
Restart fails
$ bun ~/.claude/plugins/.../scripts/worker-cli.js restart
Failed to restart: Process died during startup
status command also misbehaves — returns what looks like a Claude Code hook ACK {"continue": true, "suppressOutput": true} instead of worker status, suggesting worker-cli.js is being intercepted by the hook layer.
The worker process IS bound to port 37777 (lsof -i :37777 shows the bun process running), but every message it processes lands in failed. So it's not a port-binding issue — it's a per-message processing exception.
Schema observation
pending_messages schema has failed_at_epoch but no error-reason column — failures are recorded silently with no diagnostic detail. Adding a failure_reason TEXT (or error TEXT) column and writing the exception message there would massively help future debugging.
What I tried
worker-cli.js status — broken (returns hook ACK)
worker-cli.js restart — fails with "Process died during startup"
- Direct
bun .../worker-service.cjs — runs but disrupts the MCP server bound to port 37777 mid-execution (MCP search tools disconnected from the session)
- Inspecting logs in
~/.claude-mem/logs/ — only INFO-level entries about hook firing; no ERROR/WARN from the failing message-processing path
Environment
- Plugin: claude-mem v10.6.2
- Claude Code: 2.1.158
- Node: v24.2.0
- Bun: 1.3.11
- macOS: 15.4.1
- Install: via thedotmack marketplace
- Active hooks per
hooks.json: beforeSubmitPrompt, afterMCPExecution, afterShellExecution, afterFileEdit, stop
Suggested fixes
- Log the exception when a message transitions to
failed. Either add a failure_reason column to pending_messages or write structured errors to the log file with the message ID. Currently failures are completely silent — impossible to diagnose without source-diving the worker.
- Implement retry. Every message having
retry_count=0 while in failed status suggests the retry path isn't wired. Even a basic exponential backoff (max 3 retries) would surface transient failures vs persistent ones.
- Fix the restart path. "Process died during startup" with no further info is unhelpful — propagate the startup exception.
- Document a recovery procedure in the README — e.g. how to clear the queue, reset the worker, etc., for users in this exact state.
Workaround for now
Falling back to Claude Code's native auto-memory system (writes to ~/.claude/projects/<slug>/memory/*.md during sessions) which is independent of this plugin and unaffected.
Happy to provide more diagnostic detail if useful — DB dump, log files, anything else.
Summary
claude-mem v10.6.2 is in a state where the worker accepts queued messages from hooks but fails every single message on the first attempt with zero retries. The queue has grown to ~29,400 failed rows;
session_summariesandobservationstables have been frozen since 2026-04-17 despite hooks firing on every session daily.Evidence
Date range of failures (oldest → newest):
Failures span the entire install window — every message ever queued fails.
retry_count=0for ALL of them means whatever exception the worker hits, it's not in the retry path, and the message goes straight tofailed.For comparison, the other tables are healthy:
user_prompts: 10,417 rows, current through today (every user prompt captured fine)session_summaries: 307 rows, frozen at 2026-04-17T07:26 (when the rich-summary write last succeeded)observations: 512 rows, frozen at 2026-04-17T07:21Restart fails
$ bun ~/.claude/plugins/.../scripts/worker-cli.js restart Failed to restart: Process died during startupstatuscommand also misbehaves — returns what looks like a Claude Code hook ACK{"continue": true, "suppressOutput": true}instead of worker status, suggestingworker-cli.jsis being intercepted by the hook layer.The worker process IS bound to port 37777 (
lsof -i :37777shows thebunprocess running), but every message it processes lands infailed. So it's not a port-binding issue — it's a per-message processing exception.Schema observation
pending_messagesschema hasfailed_at_epochbut no error-reason column — failures are recorded silently with no diagnostic detail. Adding afailure_reason TEXT(orerror TEXT) column and writing the exception message there would massively help future debugging.What I tried
worker-cli.js status— broken (returns hook ACK)worker-cli.js restart— fails with "Process died during startup"bun .../worker-service.cjs— runs but disrupts the MCP server bound to port 37777 mid-execution (MCP search tools disconnected from the session)~/.claude-mem/logs/— only INFO-level entries about hook firing; no ERROR/WARN from the failing message-processing pathEnvironment
hooks.json:beforeSubmitPrompt,afterMCPExecution,afterShellExecution,afterFileEdit,stopSuggested fixes
failed. Either add afailure_reasoncolumn topending_messagesor write structured errors to the log file with the message ID. Currently failures are completely silent — impossible to diagnose without source-diving the worker.retry_count=0while infailedstatus suggests the retry path isn't wired. Even a basic exponential backoff (max 3 retries) would surface transient failures vs persistent ones.Workaround for now
Falling back to Claude Code's native auto-memory system (writes to
~/.claude/projects/<slug>/memory/*.mdduring sessions) which is independent of this plugin and unaffected.Happy to provide more diagnostic detail if useful — DB dump, log files, anything else.