> ## Documentation Index > Fetch the complete documentation index at: https://niceeval.com/docs/llms.txt > Use this file to discover all available pages before exploring further. # niceeval scoring: 断言、judge 和 outcome > niceeval 的五种评分机制：值断言、作用域断言、LLM-as-judge、test-as-scoring 和效率检查，以及 gate / soft 规则。 niceeval 的评分目标是把 agent 行为变成可解释、可比较的结果。你可以混合使用精确断言、事件流断言、LLM judge、沙箱测试和成本约束。 ## 五种评分机制 `t.check(value, matcher)`，适合文本、JSON 和结构化输出。 `t.succeeded()`、`t.calledTool()`、`t.fileChanged()` 等，运行结束后统一评估。用 judge 模型评估事实性、总结质量或 rubric 分数。在 sandbox 里运行 `EVAL.ts`、Vitest 或项目脚本。限制 token、成本和其他 usage 指标。 ## gate 与 soft gate 是硬门槛。任何 gate 失败都会让 eval 的 outcome 变成 `failed`。 soft 用于评分和排序。它不会直接让 eval 失败，但会影响最终分数。 ## Outcome 规则所有 gate 通过，且没有运行失败。任一 gate 失败、超时、adapter 失败或测试失败。运行完成，但结果主要由 soft 分数表达。 eval 被跳过。 ## 1. 值断言 `niceeval/expect` 提供可组合 matcher： ```ts theme={null} import { includes, equals, matches, similarity } from "niceeval/expect"; t.check(t.reply, includes("confirmed")); t.check(turn.data, equals({ intent: "refund" })); t.check(t.reply, matches(/order #\d+/i)); t.check(t.reply, similarity("The answer should mention Paris").atLeast(0.8)); ``` ## 2. 作用域断言作用域断言记录“整次运行”应该满足的事实： ```ts theme={null} t.succeeded(); t.calledTool("get_weather", { count: 1 }); t.notCalledTool("delete_user"); t.toolOrder(["search", "summarize"]); t.fileChanged("src/components/Button.tsx"); t.check(await t.sandbox.runCommand("npm", ["test"], { cwd: "/workspace" }), commandSucceeded()); ``` 作用域断言通常在 `test()` 结束后统一评估。不要把它当成立即返回布尔值的函数。 ## 3. LLM-as-judge Judge 用于无法精确匹配的语义质量： ```ts theme={null} t.judge.autoevals.closedQA("回答是否准确说明了退款政策?", { on: t.reply }).atLeast(0.7); t.judge.autoevals.summarizes(sourceText, { on: t.reply }).atLeast(0.8); t.judge.autoevals.closedQA("礼貌、具体、没有编造事实", { on: t.reply }).atLeast(0.75); ``` judge 模型可以在 `defineConfig`、单个 eval 或单次 judge 调用上配置。 ## 4. Test-as-scoring Sandbox fixture 通过 `EVAL.ts` 验证 agent 改出的代码： ```ts theme={null} import { test, expect } from "vitest"; import { existsSync } from "node:fs"; test("Button exists", () => { expect(existsSync("src/components/Button.tsx")).toBe(true); }); ``` 测试输出会进入 `.niceeval//.../test-output.txt`。 ## 5. 效率和成本 ```ts theme={null} t.maxTokens(50_000); t.maxCost(0.25); ``` 这些断言适合防止 agent 通过过度调用模型或工具“堆”出结果。 ## 相关阅读 * [Evals](/zh/concepts/evals) — 断言如何进入生命周期和 outcome。 * [Agents & Adapters](/zh/concepts/agents-adapters) — 标准事件流由哪里产生。 * [评分指南](/zh/guides/scoring-guide) — 更完整的写法和取舍。