Export an array from a .eval.ts file to fan out into many evals. Use loadYaml or loadJson to drive evals from external datasets with stable IDs.
Sometimes one eval scenario isn’t enough — you want to verify that your agent handles dozens of inputs correctly, with each case getting its own pass/fail outcome in the report. niceeval’s dataset fan-out lets you express this without creating dozens of individual files. When you export an array from a .eval.ts file, niceeval treats each element as a separate eval, assigns it a stable ID, and runs them all with the same concurrency and reporting as any other eval suite.
A normal .eval.ts file exports a single defineEval. A dataset file exports an array of defineEval calls. niceeval detects the array and creates one eval per element:
// evals/sql.eval.tsimport { defineEval } from "niceeval";import { equals } from "niceeval/expect";export default [ defineEval({ description: "Count users", async test(t) { await t.send("Query the total number of rows in the users table"); t.check(t.reply, equals("SELECT COUNT(*) FROM users;")); }, }), defineEval({ description: "Recent orders", async test(t) { await t.send("Query the 10 most recent orders"); t.check(t.reply, equals("SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;")); }, }),];
The file evals/sql.eval.ts becomes the file-level IDsql. Each element gets a zero-padded index appended: sql/0000, sql/0001, and so on.
The zero-padding ensures lexicographic sort order matches numeric order, making IDs stable and predictable across runs. Reordering elements in the array changes their IDs — if stable IDs across edits matter, use a dataset file instead and keep rows append-only.
Writing every test case inline gets unwieldy fast. Use loadYaml or loadJson from niceeval/loaders to read cases from external files, then map them into defineEval calls:
The loadYaml call returns the parsed document as a plain object. Cast the cases array to the shape you need, then map normally. loadJson works identically but reads a .json file:
import { loadJson } from "niceeval/loaders";const cases = await loadJson("evals/data/sql-cases.json");
The path you pass to loadYaml / loadJson is resolved relative to the project root (where niceeval.config.ts lives), not relative to the eval file. Use paths like evals/data/my-dataset.yaml.
Because each eval in a fan-out set shares the same file-level ID prefix, you can run them all with a single CLI argument:
# Run all evals from evals/sql.eval.tsnpx niceeval exp local sql# Run only the first casenpx niceeval exp local sql/0000# Run cases 0–2npx niceeval exp local sql/0000 sql/0001 sql/0002
This makes it easy to re-run just the failing cases after a fix, or to run a subset during development without touching the dataset file.
You have many cases that follow the same eval structure (same prompt template, same assertions)
Your test cases come from an external source (a spreadsheet, a database export, a curated YAML file)
You want non-engineers to be able to add cases without touching TypeScript
The number of cases is likely to grow over time
Each case has meaningfully different assertion logic
Cases require different agent configurations or timeouts
You want each case to have a human-meaningful ID (e.g., billing/refund instead of billing/0003)
You have only a handful of cases and inline code is clearer
Store all dataset files under evals/data/ by convention. This keeps them out of the eval discovery scan (which only looks for .eval.ts files and PROMPT.md directories) and signals their purpose at a glance.
Here is the full pattern for a YAML-driven dataset eval with multiple assertion types:
// evals/sql.eval.tsimport { defineEval } from "niceeval";import { loadYaml } from "niceeval/loaders";import { equals, includes } from "niceeval/expect";const doc = await loadYaml("evals/data/sql-cases.yaml");const rows = doc.cases as { task: string; prompt: string; sql: string; mustInclude?: string;}[];export default rows.map((row) => defineEval({ description: row.task, async test(t) { await t.send(row.prompt); // Gate: the run must complete without error t.succeeded(); // Gate: the reply must exactly match the expected SQL t.check(t.reply, equals(row.sql)); // Optional: if the case specifies a required keyword, check for it if (row.mustInclude) { t.check(t.reply, includes(row.mustInclude)); } // Soft: judge whether the reply is well-formed SQL t.judge.autoevals.closedQA("Is this valid, syntactically correct SQL?").atLeast(0.8); }, }),);
Running this suite produces output like:
✓ sql/0000 Count users (312ms)✓ sql/0001 Recent orders (289ms)✗ sql/0002 Active users this month (401ms) - gate: equals [FAILED] Expected: SELECT * FROM users WHERE last_sign_in >= date_trunc('month', now()); Received: SELECT * FROM users WHERE created_at >= date_trunc('month', now());