查看 niceeval 结果并调试 agent 行为

每次运行后，niceeval 都会把结构化结果写入 .niceeval/<timestamp>/。控制台提供即时反馈，结果查看器用于深入分析失败原因。

控制台输出

Discovered 3 evals

  ✓ classify (12ms)
  ✓ weather/brooklyn (456ms)
  ✗ fixtures/button (38s)
    - gate: EVAL.ts › Button accepts label / onClick [FAILED]

Results:  2 passed, 1 failed, 0 passed, 0 skipped

每行包含 eval ID、outcome 和耗时。失败项会显示断言类型和错误信息。

`.niceeval/<timestamp>/`

典型结构：

.niceeval/
└─ 2025-01-15T14-23-00/
   ├─ summary.json
   ├─ weather/
   │  └─ brooklyn/
   │     ├─ result.json
   │     ├─ events.jsonl
   │     ├─ transcript.jsonl
   │     └─ diff.json
   └─ fixtures/
      └─ button/
         ├─ result.json
         ├─ events.jsonl
         ├─ transcript.jsonl
         ├─ diff.json
         └─ test-output.txt

`niceeval view`

npx niceeval view

这会打开本地结果查看器，默认展示最近一次运行的结果。你可以浏览 eval、查看 transcript、读 diff、检查 event stream 和断言结果。数据不会上传到外部服务。

失败后立刻运行 npx niceeval view，可以直接打开刚刚那次运行的 artifacts。

Artifact 说明

`summary.json`

整次运行的汇总：run ID、pass / fail 数量、耗时、成本和每个 eval 的状态。

`events.jsonl`

标准事件流，是工具调用、消息、命令和错误的底层事实来源。

`transcript.jsonl`

便于人工阅读的对话或 agent transcript。

`diff.json`

Sandbox eval 中 agent 改动的文件 diff。

`test-output.txt`

EVAL.ts 或项目测试脚本输出。

Outcome 含义

passed
failed
passed
skipped

所有 gate 通过。

调试建议

先看 result.json 找到失败断言。
再看 transcript.jsonl，了解 agent 的决策过程。
coding-agent 失败时看 diff.json 和 test-output.txt。
工具调用问题看 events.jsonl。

​控制台输出

​.niceeval/<timestamp>/

​niceeval view

​Artifact 说明

​summary.json

​events.jsonl

​transcript.jsonl

​diff.json

​test-output.txt

​Outcome 含义

​调试建议

控制台输出

`.niceeval/<timestamp>/`

`niceeval view`

Artifact 说明

`summary.json`

`events.jsonl`

`transcript.jsonl`

`diff.json`

`test-output.txt`

Outcome 含义

调试建议