Signal types and scopes
A signal produces one value per run, typed as:- boolean — “did the agent fabricate an API?”
- number — “how many tokens did the run use?”
- category — “which install method did the agent choose?” (requires
category_enums)
run— one observation per run (the default)message— an observation per matching message, then folded into a single per-run value (sum,count,average,min,max, orhistogram). Message-scoped signals require atarget_role(assistant,user, ortool) and afold.
Extraction methods
| Method | How it works | Constraints |
|---|---|---|
pattern | Regex over message content | message scope only; boolean (via patterns) or number (via needle) |
stats | Built-in run metrics: duration, token_in, token_out, token_total, cost, tool_calls, turns, steps, status, termination_reason | run scope only |
llm | A judge model evaluates content against your prompt | boolean or category; requires model and prompt |
pattern and llm signals also declare a source — which part of the transcript to read: codeText, assistantText, userText, thinkingText, toolCalls, toolResults, finalAnswer, or any.
Aggregates
Aggregates fold per-run signal values into iteration-level metrics:count, count_where, rate, sum, avg, min, max, median, mode, count_by_category, avg_by_category, and distribution. Each aggregate references a signal by id and gets a display label. rate returns a fraction from 0 to 1 (truthy runs ÷ total runs), not a percentage.
Example config
llm extraction, model must be one of the supported judge models: claude-haiku-4-5 (the default), claude-sonnet-4, claude-sonnet-4-5, claude-sonnet-4-6, gpt-4o-mini, gpt-4o, gpt-4.1-mini, gpt-4.1, or gpt-4.1-nano. Unsupported model ids pass validation but fail at extraction time.
Working with signal configs from the CLI
Validate locally before uploading — validation runs entirely client-side and needs no authentication:0 if the config is valid and 1 with specific errors if not, so it works as a CI check or pre-commit hook.
Attach the config when creating the experiment, or update it later: