What a fixture is
A fixture is a directory that contains at least two files:PROMPT.md and EVAL.ts. Everything else in the directory becomes the agent’s starting workspace.
PROMPT.md. You don’t need to write a .eval.ts wrapper or register the fixture anywhere. Nested directories work fine: fixtures/api/auth/ is discovered and gets the ID fixtures/api/auth.
PROMPT.md
PROMPT.md is the task description sent to the coding agent. Write it exactly as you’d write a prompt — be specific about what the agent should produce, which files to touch, and any constraints to respect.
PROMPT.md as its initial message. Keep it self-contained so the task is unambiguous without extra context.
EVAL.ts
EVAL.ts contains your validation logic written in Vitest style. Each test() block becomes a gate assertion in the eval result: if the test fails, the eval fails.
Workspace files
Every file in the fixture directory other thanEVAL.ts is part of the agent’s visible workspace. The agent can read, edit, and delete them freely. Common things to include:
- A
package.jsonwith"type": "module"and any project dependencies - Starter source files the agent should build on or refactor
- A
tsconfig.jsonif the project uses TypeScript - Any configuration files the agent might need (
eslint.config.js,.prettierrc, etc.)
package.json must include "type": "module" for niceeval’s module loading to work correctly inside the sandbox.Auto-discovery
niceeval scans your eval directory for any subdirectory that contains aPROMPT.md file. There is no registration step. The fixture’s ID is derived from its path relative to the eval root, the same way .eval.ts IDs are derived:
| Fixture path | Eval ID |
|---|---|
evals/fixtures/button/ | fixtures/button |
evals/fixtures/api/auth/ | fixtures/api/auth |
npx niceeval exp local fixtures/button.
Running a fixture
Run with a sandbox backend
Select the coding agent in your experiment file and use niceeval will start a fresh Docker container, upload the workspace files (excluding
--sandbox only when you need to override the isolation backend.EVAL.ts), run the agent, upload EVAL.ts, execute the Vitest tests, collect the diff, and tear down the container.Asserting agent behavior with o11y
Beyond asserting file contents, you can assert what the agent did — which shell commands it ran, which tools it called, and how it navigated the task. After the agent finishes, niceeval injects an observability summary into the sandbox at__niceeval__/results.json. Your EVAL.ts can read this file.
o11y object includes fields like shellCommands (with each command’s text and exit code), tool calls, and subagent invocations. This lets you gate on how the agent achieved its result, not only what it produced.
The defineAgentEval alternative
If you prefer to define fixture-style evals in code — for example, to share assertion logic across multiple tasks or to control the execution flow programmatically — use defineAgentEval:
Fixture vs defineAgentEval — when to use which
Fixture vs defineAgentEval — when to use which
| Fixture (directory) | defineAgentEval | |
|---|---|---|
| Discovery | Automatic | Requires a .eval.ts file |
| Validation | Vitest tests in EVAL.ts | Programmatic assertions in test(t) |
| Best for | Large suites, multi-language projects | Fine-grained control, shared assertion logic |
| Assertion style | Vitest expect | niceeval t.* methods |
Workspace assertions in defineAgentEval
When you use defineAgentEval, the t context exposes workspace-level assertions you can call directly:
t.sandbox.diff is a queryable object: t.sandbox.diff.get(path) returns the post-change contents of a file, t.sandbox.diff.isEmpty() checks for no changes, and t.sandbox.diff.matches(re) / t.notInDiff(re) test the full diff text against a regular expression.