Skip to main content
The niceeval/expect module provides a small set of composable matchers that you pass to t.check() and t.require(). Each matcher is a function that returns an Assertion — a typed scoring function of shape (value: unknown) => number | Promise<number> — and carries a default severity (gate or soft) that determines how a failure affects the eval outcome. You can override the severity on any matcher by chaining .gate() or .atLeast(0.7).
import { includes, equals, matches, similarity, satisfies } from "niceeval/expect";

How matchers are used

You pass matchers as the second argument to t.check() or t.require():
// t.check — records the result; execution continues regardless
t.check(t.reply, includes("confirmed"));
t.check(turn.data, equals({ intent: "refund" }));

// t.require — throws immediately if the assertion fails; use for preconditions
t.require(turn.status, equals("completed"));
The difference between check and require:
MethodOn failureUse for
t.checkRecords result, execution continuesMost assertions
t.requireThrows immediately, aborts the testPreconditions where continuing is meaningless

Matchers

includes

includes(substring: string | RegExp): Assertion
Asserts that value (coerced to a string) contains substring, or that it matches pattern if a RegExp is provided. Case-sensitive by default when passing a string. Default severity: gate
t.check(t.reply, includes("Paris"));
t.check(t.reply, includes(/order #\d+/i));
Chain .atLeast(0.7) if you want this to score rather than hard-fail:
t.check(t.reply, includes("recommended").atLeast(0.7));

equals

equals(expected: unknown): Assertion
Asserts deep structural equality between value and expected. Works on primitives, plain objects, arrays, and nested structures. Equivalent to a recursive JSON.stringify-style comparison (order-insensitive for object keys, order-sensitive for arrays). Default severity: gate
t.check(turn.data, equals({ intent: "refund" }));
t.check(turn.data, equals(["a", "b", "c"]));
For partial matching (asserting a subset of keys), use satisfies with a custom predicate instead of equals.

matches

matches(schema: StandardSchema): Assertion
Validates value against a Standard Schema compatible schema. Zod, Valibot, ArkType, and other Standard Schema-compliant libraries work out of the box. Default severity: gate
import { z } from "zod";

t.check(
  turn.data,
  matches(z.object({ intent: z.enum(["refund", "ship", "track"]) })),
);
Returns score 1 if validation passes, 0 if it fails. When the schema produces a parse error, the error message is attached to the recorded assertion for easy debugging.

similarity

similarity(expected: string, opts?: SimilarityOpts): Assertion
Scores value (coerced to a string) against expected using normalized Levenshtein distance, returning a score between 0 (completely different) and 1 (identical). Useful when you want to reward approximate correctness rather than enforce an exact match. Default severity: soft
t.check(t.reply, similarity("The capital of France is Paris.").atLeast(0.8));
Chain .gate() if a minimum similarity must be met for the eval to pass:
t.check(t.reply, similarity(expectedSql).gate().atLeast(0.95));
opts fields:
atLeast
number
Minimum score (0–1) required to pass. Defaults to the threshold set by chaining .atLeast(n) on the returned assertion; if neither is specified the assertion always records the raw score as a soft metric.

satisfies

satisfies(predicate: (value: unknown) => boolean, label?: string): Assertion
Asserts value satisfies an arbitrary predicate function. The optional label string appears in reports and logs to make the assertion’s intent readable. Default severity: gate
t.check(turn.data, satisfies((d) => (d as any).total > 0, "total is positive"));
t.check(t.usage.outputTokens, satisfies((n) => (n as number) < 10_000, "output not verbose"));
The predicate receives the raw value without type coercion — cast as needed inside the function. Return true for pass, false for fail.

Chaining severity

Every matcher returns an Assertion object that exposes .gate() and .atLeast(0.7) methods, letting you override the default severity inline:
// Downgrade an includes assertion from gate to soft:
t.check(t.reply, includes("optional detail").atLeast(0.7));

// Upgrade a similarity assertion from soft to gate:
t.check(t.reply, similarity(expected).gate());
Severity controls how a failure affects the eval outcome:
SeverityFailure effectStrict mode (--strict)
gateEval is failedSame
softEval is passed (not failed)Eval is failed
Use gate for correctness requirements and soft for quality metrics you want to track but not block on by default.

The Assertion type

An Assertion is a scoring function with attached metadata:
type Assertion = {
  (value: unknown): number | Promise<number>;
  name: string;
  severity: "gate" | "soft";
  gate(): Assertion;
  soft(): Assertion;
  atLeast(threshold: number): Assertion;  // available on similarity
};
Scores are normalized to [0, 1]:
  • 1 — fully passing
  • 0 — fully failing
  • Values between 0 and 1 — partial credit (primarily used by similarity and judge assertions)
For gate assertions, any score below 1 is treated as a failure. For soft assertions, scores are recorded and compared against the configured threshold (set via .atLeast(n)).

Custom matchers with makeAssertion

When the built-in matchers don’t cover your use case, create a custom assertion with makeAssertion. Your scoring function receives the raw value and must return a number (synchronously or as a Promise).
import { makeAssertion } from "niceeval/expect";

function jsonValid(): Assertion {
  return makeAssertion({
    name: "jsonValid",
    severity: "gate",
    score: (value) => {
      try {
        JSON.parse(String(value));
        return 1;
      } catch {
        return 0;
      }
    },
  });
}

t.check(t.reply, jsonValid());
name
string
required
A short identifier shown in reports and logs when this assertion fails. Use a descriptive name that makes the failure reason self-evident.
severity
"gate" | "soft"
required
The default severity for this assertion. Can be overridden by callers via .gate() and .atLeast(0.7) chaining, just like built-in matchers.
score
(value: unknown) => number | Promise<number>
required
The scoring function. Receives the raw value and must return a number in [0, 1]. Async scoring functions (e.g. calling an external API) are fully supported.
score: async (value) => {
  const res = await externalGrader.grade(String(value));
  return res.score / 100;
},

Async custom matchers

Custom matchers support async scoring natively. The runner awaits all assertions before computing the final outcome.
function semanticallySimilar(reference: string): Assertion {
  return makeAssertion({
    name: "semanticallySimilar",
    severity: "soft",
    score: async (value) => {
      const embedding1 = await embedText(String(value));
      const embedding2 = await embedText(reference);
      return cosineSimilarity(embedding1, embedding2);
    },
  });
}

t.check(t.reply, semanticallySimilar("A greedy algorithm for interval scheduling.").atLeast(0.75));

Quick reference

MatcherDefault severityScore typeBest for
includes(str | RegExp)gatebinary (0 or 1)Keyword / pattern presence
equals(expected)gatebinary (0 or 1)Exact value or structural match
matches(schema)gatebinary (0 or 1)Schema / type validation
similarity(expected)softcontinuous (0–1)Near-match text, approximate answers
satisfies(predicate, label?)gatebinary (0 or 1)Arbitrary logic
makeAssertion({ name, severity, score })configurablecontinuous (0–1)Any custom requirement