Anatomy of a task
| Part | Purpose |
|---|---|
| Instruction | The prompt the agent receives — what to build, install, or accomplish |
| Goals | Success criteria, each with weighted scoring and a passing threshold |
| Init files | Files placed in the sandbox before the run: zip uploads or git repositories |
| Init commands | Shell commands run during sandbox setup (installing dependencies, seeding data) |
| Secrets | Named credentials exposed to the run as environment variables |
Writing instructions
Write instructions the way a real user would brief an agent — not the way you wish they would. The goal is to measure your product’s agent experience, so avoid embedding hints the average user wouldn’t provide:- Good: “Set up error tracking for this Express app using Acme.”
- Too helpful: “Install
@acme/sdk@2.1, then callacme.init()with the DSN from the dashboard.”
Goals and criteria
Each goal contains one or more criteria. A criterion has an evaluation type, a weight, and a max score. Goals are scored byweighted_average, binary, or percentage, against a passing threshold from 0–100.
Criteria are evaluated by one of:
| Evaluation | What it checks |
|---|---|
comparison | A run metric against a threshold (duration, tokens, cost) |
file_exists | A file is present in the sandbox after the run |
file_content_match | File contents match a pattern |
json_schema | Output validates against a JSON Schema |
bash_command | A shell command exits 0 in the post-run sandbox |
python_script | A custom Python check passes |
script_judge | Your own verification script (exit 0 = pass) |
llm_judge | An LLM judges the transcript against a rubric |
file_exists, bash_command, script_judge) where possible — they’re cheaper and more reproducible. Reserve llm_judge for genuinely subjective criteria like “did the agent follow the documented approach?”
Sandbox setup
Init files and commands prepare the sandbox before the agent starts:- Zip upload — extract an archive into the sandbox (default target: the agent’s home directory)
- Git clone — clone a repository at a specific ref; private repos authenticate via a secret reference
- Init commands — run shell commands in order, each with a working directory and timeout
Secrets
Secrets are encrypted credentials scoped to your organization (or to a single environment). Reference them by name in a task and they’re injected as environment variables during the run — API keys never appear in the task definition or transcripts. Secrets are managed on environments — see Environments for the CLI workflow.Working with tasks from the CLI
Tasks are created from a JSON file. Runtpc sim spec to print the full contract; a minimal task.json looks like:
category is one of coding, research, documentation, or analysis. Don’t include a product field — the CLI injects your active product (set with tpc product switch).
tpc sim create with a simulation.json: