Recommended directory structure
PROMPT.md is the task description the agent reads. EVAL.ts is the verification script niceeval runs after the agent finishes. Plugin installation, configuration, or tokens can live in the fixture files, in a sandbox lifecycle hook, or in your agent adapter’s setup().
Write the task
PROMPT.md
EVAL.ts or inspect the o11y summary for tool calls or shell commands.
Write the verification
EVAL.ts
__niceeval__/results.json:
Run
fixtures/plugin/create-button argument is only an eval ID prefix filter.
Compare plugin impact
Model “plugin on” vs “plugin off” as two agents or two experiment cells:- Whether
pass@Nimproves. - Whether average latency and token usage are acceptable.
- Whether the agent actually invoked the plugin in failing transcripts.
- Whether the diff only touched task-relevant files.
Copy to your agent
Next steps
- Fixtures — full reference for the fixture directory layout and
EVAL.ts. - Sandbox Agent — built-in
claude-code,codex, and custom sandbox agents. - Viewing Results — inspect transcripts, diffs, and the event stream.