Skip to main content
Coding Agent Optimization lets you run real coding agents — Claude Code, Codex, and OpenCode — against your product inside isolated sandboxes, score each attempt against goals you define, and turn the results into concrete fixes for your docs, APIs, and onboarding. If AI agents struggle to install your SDK, authenticate, or follow your docs, they recommend something else. This product shows you exactly where that friction happens and what to change.

How it works

  1. Define a task — what the agent should accomplish with your product (e.g. “install the SDK and send a first event”), plus the goals that define success
  2. Define environments — which agent harness and model attempt the task, and what sandbox it runs in
  3. Run an experiment — execute every task across every environment in parallel, in fresh sandboxes
  4. Review the results — per-run scores, full transcripts, extracted signals, and failure clusters with root causes and recommended fixes

How the pieces fit together

Experiment
├── Tasks         (what to do + how success is measured)
├── Environments  (which agent/model attempts it)
└── Iterations    (numbered batches of runs)
    └── Runs      (task × environment, one sandbox each)
        ├── score, pass/fail, transcript, cost
        └── signal values → aggregated per iteration
Each concept has its own page under Core concepts in the sidebar, and a matching command group in the tpc CLI:
ConceptCommand groupCommon commands
Taskstpc sim taskcreate --file task.json, list, get, update, run
Environmentstpc sim envcreate, list, update, task attach/detach, secret set
Runs & iterationstpc sim runlist, get, logs, actions
Experimentstpc sim experimentcreate, run, run status --watch, results
Signalstpc sim experimentvalidate-signal-config, signals

What you get from each run

  • A pass/fail verdict and a 0–100 score against your goals
  • The complete agent transcript: every message, tool call, and thinking step
  • An archive of the sandbox after the run, so you can inspect exactly what the agent built
  • Token usage, cost, and duration
  • Signal values — custom metrics you define in YAML, extracted from the run automatically

Try it from the CLI

Everything below can be driven end-to-end with the tpc CLI under tpc sim:
# One-time setup
curl -fsSL https://cli.promptingco.com/install.sh | bash
tpc auth login
tpc product switch my-product

# 1. Create an environment — the agent configuration under test
tpc sim env create --name "Claude Code + Sonnet" \
  --agent-config '{"harness":"claude","provider":"anthropic","model":"claude-sonnet-4-6"}'

# 2. Create a task and attach it to that environment
#    (run `tpc sim spec` for the full simulation.json contract)
tpc sim create --file simulation.json

# 3. Group them into an experiment, run it, and read the results
tpc sim experiment create --name "Quickstart friction" --task-ids task_abc --env-ids env_123
tpc sim experiment run exp_789
tpc sim experiment run status exp_789 --watch
tpc sim experiment results exp_789
Every command accepts --format json for scripting. The guides below include the CLI workflow for each abstraction.

Where to start

  • Tasks — defining what agents should do and how success is measured
  • Environments — configuring agents and sandboxes
  • Runs & iterations — what happens during an attempt and what it records
  • Experiments — running iterations and reading results
  • Signals — extracting custom metrics from runs