niceeval CLI: 命令、flags 和退出码参考

niceeval CLI 是发现、运行和查看 eval 的入口。实际执行 eval 时采用 experiment-first 模型：exp 选择可签入的运行配置，experiment 后的位置参数才按 eval ID 前缀过滤。agent、model 和 feature flags 写在 experiments/，不靠临时 CLI 参数。

命令

npx niceeval exp <group> [id-prefix]

运行命名实验组或配置。可选尾随位置参数按 eval ID 前缀过滤。

npx niceeval init

生成 evals/、niceeval.config.ts 和示例文件。

npx niceeval list

发现并打印所有 eval，不运行。

npx niceeval clean

删除 .niceeval/ 历史运行 artifacts。

npx niceeval view

打开结果查看器，查看最近一次运行的 artifacts。

输出语言

niceeval 的 CLI 和运行时文案支持本地化,不需要在 niceeval.config.ts 里加配置。需要固定输出语言时用环境变量：

NICEEVAL_LANG=en npx niceeval list
NICEEVAL_LANG=zh-CN npx niceeval list

检测顺序是 NICEEVAL_LANG、NICEEVAL_LOCALE、LC_ALL、LC_MESSAGES、LANG。以 zh 开头的值使用 zh-CN,其它语言使用 en;都没有时默认 zh-CN。这只影响终端/runtime 文案,不改变结果 JSON 里的机器字段,也不翻译 LLM judge prompt。

`npx niceeval exp [group|config] [id-prefix...]`

# 运行 experiments/ 下全部实验
npx niceeval exp

# 运行一个实验组
npx niceeval exp compare-models

# 在该组里只运行 ID 以 weather 开头的 eval
npx niceeval exp compare-models weather

eval 过滤参数只出现在 experiment 选择之后。裸 npx niceeval weather 不会运行；请使用 npx niceeval exp local weather 或 npx niceeval exp compare weather。

常用 flags

--agent

string

experiment 运行不支持该 flag。要换 agent，请在 experiments/ 下新增或复制一个配置文件。

--sandbox

string

临时选择当前 experiment 的 sandbox 后端：docker、vercel 或 auto。

npx niceeval exp local --sandbox docker

--model

string

experiment 运行不支持该 flag。要换模型，请新增或复制一个 experiment 文件并修改 model。

--max-concurrency

number

设置同时运行的 eval 数量。

--runs

number

每个 eval 运行多少次，常用于 pass@N。

--early-exit

boolean

某个 eval 的一次 attempt 通过后，停止剩余 attempts。

--timeout-ms

number

单个 eval 的超时时间。

--budget

number

整次运行的预算上限。

--strict

boolean

CI 中推荐使用。让失败更明确地反映到退出码。

`list`

npx niceeval list

用于检查 eval 发现、ID 和配置加载是否正常。

`exp`

npx niceeval exp compare-models
npx niceeval exp compare-models weather-tool

运行命名 experiment，用矩阵比较 agents、models 或 flags。第二个参数开始是 eval ID 前缀过滤。

`view`

npx niceeval view

打开本地结果查看器。默认查看最近一次 .niceeval/ 运行。

退出码

CI 中的基本规则：存在失败 gate 或运行失败时，命令应非零退出；全部通过时为 0。需要严格行为时使用 --strict。

​命令

npx niceeval exp <group> [id-prefix]

npx niceeval init

npx niceeval list

npx niceeval clean

npx niceeval view

​输出语言

​npx niceeval exp [group|config] [id-prefix...]

​常用 flags

​list

​exp

​view

​退出码

命令

输出语言

`npx niceeval exp [group|config] [id-prefix...]`

常用 flags

`list`

`exp`

`view`

退出码