Testing

Automated tests

117 tests across three layers. All pass on a Max plan workstation.

npm test                           # unit + integration (~7s)
RUN_LIVE_TESTS=1 npm run test:e2e  # live CLI + OpenClaw e2e (~30s)
RUN_LIVE_TESTS=1 npm test          # everything (~37s)

Layer	What it covers
Unit	scrubPrompt, unscrubResponse, buildUsage, buildMsg, resolveSessionKey, resolveAgentId, deriveTurnSessionKey, MCP config/bootstrap, model catalog, healthcheck
Integration	Mock CLI NDJSON scenarios, timeout, stderr capture, MCP agent identity, system prompt resume, concurrency, stale-resume recovery, workspaceDir routing, prompt extraction
E2E: OpenClaw	Plugin registration, provider exposure, agent smoke test
E2E: live CLI	Real Claude CLI response, scrubbed prompt acceptance
E2E: stream live	createClaudeCliStreamFn with real CLI + Max plan OAuth, session resume

Integration test breakdown

Test	What it validates
simple, streaming, assistant	Basic NDJSON event parsing and stream lifecycle
malformed	Malformed NDJSON lines skipped without crash
scrub, scrub-streaming	Detection token unscrubbing in result and delta paths
double-unscrub guard	endStream doesn’t re-unscrub already-processed streaming text
empty	Empty CLI output produces “(no response)” fallback
hang + timeout	Request timeout kills hung process, emits error (3s test)
stderr capture	CLI stderr included in error events for diagnostics
4 concurrency tests	Parallel streams, session map integrity, same-key safety, no cross-contamination
MCP agent identity (3 tests)	`OPENCLAW_MCP_AGENT_ID` propagation, throws (not silent fallback) when agentId is unresolved
Stale –resume recovery	Error event surfaces real claude error text; bogus session_id from error is not persisted; cached id is dropped on resume failure
Auth-error surfacing	When `errors[]` is empty, the user-visible error uses `data.result` text and `api_error_status` (HTTP code)
workspaceDir migration	Claude is spawned with `cwd = ctx.workspaceDir`; sessions persist under `<workspaceDir>/.glueclaw/sessions.json`
Prompt extraction (Telegram)	Trailing `Conversation info (untrusted metadata):` user-role messages are skipped so the actual user text wins

What the tests prove

Auth path validated: createClaudeCliStreamFn deletes ANTHROPIC_API_KEY, forcing Max plan OAuth. Tested end-to-end on a real Max plan workstation.
Session resume works: Two sequential calls with the same session key, second call retrieves context from first via --resume.
Stale-resume recovers automatically: When claude rejects a cached session id (e.g. project storage cleared), the next turn surfaces the real error and clears the cache; the turn after that succeeds against a fresh session. No silent (no response) masking.
Per-agent isolation: With ctx.workspaceDir, each agent’s claude project is anchored at a distinct cwd hash and its session-id cache lives at <workspaceDir>/.glueclaw/sessions.json. Cross-agent bleed is structurally impossible.
Channel inbound stays readable: Telegram-style Conversation info (untrusted metadata): wraps are filtered out of prompt extraction; the user’s actual text wins. Session-key derivation (which needs chat_id) keeps a separate filter that only skips Sender … blocks.
Concurrency safe: 3 parallel streams complete independently. Session file is valid JSON after concurrent writes. Same session key doesn’t crash under concurrent access.
Crash resilient: Hung CLI killed after timeout. SIGKILL fallback if SIGTERM ignored. Atomic session writes prevent corruption.

Manual procedures

Smoke test

export GLUECLAW_KEY=local
openclaw agent --agent main --message "say pong" 2>&1 | tail -n 1
# Expected: pong

Multi-turn memory

In openclaw tui:

“remember the word: mango” — expect acknowledgment
“what word did I ask you to remember?” — expect “mango”

Detection check

# Scrubbed prompt — should pass
openclaw agent --agent main --message "say hi" 2>&1 | tail -n 1

# Raw OpenClaw trigger — should fail with 400
claude --append-system-prompt \
  "You are a personal assistant running inside OpenClaw." \
  -p "say hi" 2>&1

MCP bridge

In openclaw tui, ask “what MCP tools do you have access to from openclaw?” — expect a list including message, sessions_list, memory_search, web_search.

Multi-agent isolation

See the multi-agent guide for the full three-round bleed test. Short version: store a unique secret per agent, ask each to recall, ask each to self-identify via agents_list. Each agent must return its own secret and its own id — pre-#36 builds had every non-main agent reporting as main.