Skip to main content

Behavioral Traces

packages/evals/ no longer ships a local eval CLI, local YAML ingress, local judge stack, or local bundle/report pipeline. The current split is:
  1. @moltzap/server-core emits behavioral traces as OpenTelemetry spans (Effect’s withSpan).
  2. The server records message-delivery and hook-block traces directly where they happen, as the moltzap.message.delivered and moltzap.message.blocked spans.
  3. packages/evals/ only keeps the scenario YAML catalog.
  4. @moltzap/runtimes owns the runtime adapters and the compiled trace-capture-harness module loaded by cc-judge.
  5. External cc-judge runners own scenario loading, harness dispatch, scoring, and report emission.

Trace emission

  • MessageService emits moltzap.message.delivered and moltzap.message.blocked OTel spans (Effect withSpan). Spans carry message-shape metadata only, never message body content: message id, conversation id, sender id, created-at, part count, text-part count, total text length, channel key, sender display name, recipients/delivered (delivered) or block reason (blocked). Message body plaintext is deliberately redacted from telemetry — the envelope is encrypted at rest and spans can egress to an operator OTLP collector, so the body never belongs on a span.
  • app/tracing.ts wires the OTel SDK Layer; production exports via batch OTLP when an OTLP endpoint env var is set (OTEL_EXPORTER_OTLP_TRACES_ENDPOINT used verbatim, else OTEL_EXPORTER_OTLP_ENDPOINT suffixed with /v1/traces), otherwise spans stay in-process.
  • Tests inject an InMemorySpanExporter via CoreConfig.spanProcessor and read finished spans from CoreTestServer.spanExporter.

Verification

MoltZap-side verification now lives in package builds/tests plus the real cc-judge path, not a local eval CLI:
pnpm --filter @moltzap/runtimes test
pnpm --filter @moltzap/runtimes build

pnpm --filter @moltzap/server-core exec vitest run \
  src/app/layers.test.ts \
  src/app/tracing.test.ts

pnpm --filter @moltzap/server-core exec vitest run \
  --config vitest.integration.config.ts \
  src/__tests__/integration/task/trace-spans.test.ts

pnpm --filter @moltzap/server-core build
Behavioral traces are verified at the span-exporter level: trace-spans.test.ts reads finished OTel spans from CoreTestServer.spanExporter and asserts on span name and the metadata attributes, and asserts that no span attribute carries message body plaintext. tracing.test.ts covers the OTLP endpoint env-var resolution (trace-specific precedence + URL normalization).

cc-judge

cc-judge is now the intended execution owner, but MoltZap does not vendor a local cc-judge binary or wrap its CLI. That means:
  • MoltZap scenario YAML stays in packages/evals/scenarios/
  • the server emits trace data as OpenTelemetry spans (moltzap.message.delivered / moltzap.message.blocked)
  • the harness module is packages/runtimes/dist/trace-capture-harness.js
  • a consuming repo or local environment must install cc-judge separately
Current operator path:
pnpm build
cc-judge run packages/evals/scenarios/EVAL-005.yaml --results ./eval-results
For OpenClaw-backed eval runs, the default agent model is now minimax/MiniMax-M2.7-highspeed unless a harness payload or runtime caller overrides it.