> ## Documentation Index > Fetch the complete documentation index at: https://niceeval.com/docs/llms.txt > Use this file to discover all available pages before exploring further. # niceeval CLI: commands, flags, and exit codes reference > Complete niceeval CLI reference: exp, init, list, clean, and view commands. Covers experiment selection, eval filtering, sandbox, concurrency, budget, and JUnit CI output. The niceeval CLI is your single entry point for discovering, running, and inspecting evals. Actual eval execution is experiment-first: **`exp` selects the signed-in run configuration**, and any positional arguments after the experiment select evals by ID prefix. Agent, model, and feature flags live in `experiments/`, not in ad-hoc CLI arguments. ## Commands Run a named experiment group or config. Optional trailing positionals filter eval IDs. Scaffold `evals/`, `niceeval.config.ts`, and example eval files. Discover and print all evals without running them. Delete `.niceeval/` historical run artifacts. Open the result viewer to inspect artifacts from the last run. *** ## Output language niceeval localizes CLI and runtime messages without adding a config key. Set an environment variable when you want deterministic output: ```bash theme={null} NICEEVAL_LANG=en npx niceeval list NICEEVAL_LANG=zh-CN npx niceeval list ``` Locale detection order is `NICEEVAL_LANG`, `NICEEVAL_LOCALE`, `LC_ALL`, `LC_MESSAGES`, then `LANG`. Values starting with `zh` use `zh-CN`; other languages use `en`. If none are set, niceeval defaults to `zh-CN`. This affects terminal/runtime text only; machine result fields and LLM judge prompts are not translated. *** ## `npx niceeval exp [group|config] [id-prefix...]` The `exp` command runs configs from `experiments/`. The first positional argument selects an experiment group or a single config; any remaining positionals narrow the run to evals whose IDs match those prefixes. ```bash theme={null} # Run every experiment under experiments/ npx niceeval exp # Run one experiment group npx niceeval exp compare-models # Run only evals whose ID starts with "weather" inside that group npx niceeval exp compare-models weather ``` Eval filters only appear after the experiment selector. Bare `npx niceeval weather` does not run; use `npx niceeval exp local weather` or `npx niceeval exp compare weather`. ### Flags #### Experiment & sandbox Not supported for experiment runs. Add or copy an experiment file and set the agent there so the run configuration is reviewable and reproducible. Temporarily choose the sandbox backend for the selected experiment. Accepted values: `docker`, `vercel`, `auto`. * `docker` — spin up a local Docker container (no cloud credentials needed). * `vercel` — use Vercel Sandbox (requires `VERCEL_TOKEN` or `VERCEL_OIDC_TOKEN`). * `auto` — detect automatically: use `vercel` when a Vercel token is present, otherwise fall back to `docker`. ```bash theme={null} npx niceeval exp local --sandbox docker npx niceeval exp local --sandbox auto ``` #### Run control Re-run matching evals whenever a source file changes. Useful during authoring to get instant feedback as you edit. ```bash theme={null} npx niceeval exp local weather --watch ``` Discover and print the evals that would be run, but do not execute them. Good for verifying your ID prefix and tag filters before a long run. ```bash theme={null} npx niceeval exp compare --dry npx niceeval exp compare fixtures --dry ``` Bypass the result cache. Normally, evals whose fixture content and config have not changed since their last passing run are skipped automatically. Use `--force` to always re-run everything regardless of cached results. ```bash theme={null} npx niceeval exp compare --force ``` Number of times to run each matched eval. Use this when you care about pass rate rather than a single pass/fail outcome. Results are grouped and reported as `pass@N`. ```bash theme={null} # Run each eval 5 times and report pass@5 npx niceeval exp compare fixtures/button --runs 5 ``` Disable early-exit. By default, once one attempt for a given eval passes, niceeval cancels the remaining attempts for that eval. Pass this flag to run all `--runs` attempts regardless, which gives you the full pass rate distribution. ```bash theme={null} npx niceeval exp compare fixtures/button --runs 10 --no-early-exit ``` Filter evals by tag. Only evals that declare the given tag in their definition will be included in the run. ```bash theme={null} npx niceeval exp compare --tag smoke ``` #### Performance & limits Override the `maxConcurrency` value set in `niceeval.config.ts`. Controls how many eval attempts run in parallel. ```bash theme={null} npx niceeval exp compare --max-concurrency 4 ``` Per-eval timeout in milliseconds. If an eval takes longer than this value, it is force-cancelled and marked as `failed` with `error: timeout`. Overrides `timeoutMs` from config. ```bash theme={null} # 10-minute timeout per eval npx niceeval exp compare --timeout 600000 ``` Cost limit in USD for the entire run. niceeval tracks cumulative estimated spend across all attempts. Once the budget is exceeded, no new attempts are dispatched; in-flight attempts complete normally. Overrides `budget` from config. ```bash theme={null} npx niceeval exp compare --budget 2.50 ``` #### CI & reporting Treat `passed` outcomes (where the LLM-as-judge score falls below the configured threshold) as failures. Without `--strict`, passed-but-below- threshold evals produce a non-zero exit code only when their outcome is explicitly `failed`. With `--strict`, they also cause a non-zero exit. Use this flag in CI to enforce quality thresholds. ```bash theme={null} npx niceeval exp ci --strict --junit .niceeval/junit.xml ``` Write a JUnit XML report to the specified path. Compatible with GitHub Actions, CircleCI, and any CI system that ingests JUnit reports. ```bash theme={null} npx niceeval exp ci --junit .niceeval/junit.xml ``` *** ## `npx niceeval init` Scaffolds the minimum files needed to start writing evals in the current repository. Running `init` creates: ``` your-project/ ├─ niceeval.config.ts └─ evals/ ├─ hello.eval.ts # example: conversational eval └─ fixtures/ └─ button/ # example: sandbox coding-agent eval ├─ PROMPT.md ├─ EVAL.ts └─ package.json ``` ```bash theme={null} npm install -D niceeval npx niceeval init ``` After running `init`, add an experiment under `experiments/` that selects the agent, model, runs, and sandbox for your first run. *** ## `npx niceeval list` Discovers all evals (and, if an `experiments/` directory exists, all experiment files) and prints their IDs. Does not run anything. Use this to verify discovery before a run. ```bash theme={null} npx niceeval list # Output: # classify # weather/brooklyn # weather/manhattan # fixtures/button ``` Use `npx niceeval exp --dry` to preview how a specific experiment will filter and expand evals. *** ## `npx niceeval exp ` Runs an experiment group. niceeval scans the `experiments/` directory for TypeScript files whose default export is a `defineExperiment` call. The `` argument is the folder name — all experiment files inside that folder are treated as one comparable group. ```bash theme={null} # Run all experiments in experiments/compare/ npx niceeval exp compare ``` Experiments expand into a matrix of `agent × model × eval × runs` attempts. Results are reported as pass rates grouped by `(agent, model, eval)`, rather than single pass/fail outcomes. *** ## `npx niceeval view` Opens the result viewer in your browser. The viewer reads artifacts from the most recent `.niceeval//` output directory and lets you inspect per-eval results, transcripts, file diffs, and event streams. ```bash theme={null} npx niceeval view ``` *** ## Exit codes niceeval's exit code lets CI systems gate on eval results without parsing output. | Condition | Exit code | | ----------------------------------------------- | --------- | | All evals passed or passed (without `--strict`) | `0` | | Any eval failed | non-zero | | Any eval passed below threshold with `--strict` | non-zero | *** ## Common patterns ```bash theme={null} # Watch a specific eval while you develop npx niceeval exp local fixtures/button --sandbox docker --watch # Dry-run to check which evals will run npx niceeval exp local --dry # Force a re-run ignoring the cache npx niceeval exp local fixtures/button --force ``` ```yaml theme={null} # .github/workflows/evals.yml - name: Run evals run: npx niceeval exp ci --strict --junit .niceeval/junit.xml env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} ``` ```bash theme={null} # Run each eval 10 times; stop early once one attempt passes npx niceeval exp local fixtures/button --runs 10 --sandbox docker # Run all 10 attempts regardless (full distribution) npx niceeval exp local fixtures/button --runs 10 --no-early-exit ``` ```bash theme={null} # Run the comparison experiment but cap spend at $5 npx niceeval exp compare --budget 5.00 ```