niceeval/expect matchers and custom assertion reference

The niceeval/expect module provides a small set of composable matchers that you pass to t.check() and t.require(). Each matcher is a function that returns an Assertion — a typed scoring function of shape (value: unknown) => number | Promise<number> — and carries a default severity (gate or soft) that determines how a failure affects the eval outcome. You can override the severity on any matcher by chaining .gate() or .atLeast(0.7).

import { includes, equals, matches, similarity, satisfies } from "niceeval/expect";

How matchers are used

You pass matchers as the second argument to t.check() or t.require():

// t.check — records the result; execution continues regardless
t.check(t.reply, includes("confirmed"));
t.check(turn.data, equals({ intent: "refund" }));

// t.require — throws immediately if the assertion fails; use for preconditions
t.require(turn.status, equals("completed"));

The difference between check and require:

Method	On failure	Use for
`t.check`	Records result, execution continues	Most assertions
`t.require`	Throws immediately, aborts the test	Preconditions where continuing is meaningless

Matchers

includes

includes(substring: string | RegExp): Assertion

Asserts that value (coerced to a string) contains substring, or that it matches pattern if a RegExp is provided. Case-sensitive by default when passing a string. Default severity: gate

t.check(t.reply, includes("Paris"));
t.check(t.reply, includes(/order #\d+/i));

Chain .atLeast(0.7) if you want this to score rather than hard-fail:

t.check(t.reply, includes("recommended").atLeast(0.7));

equals

equals(expected: unknown): Assertion

Asserts deep structural equality between value and expected. Works on primitives, plain objects, arrays, and nested structures. Equivalent to a recursive JSON.stringify-style comparison (order-insensitive for object keys, order-sensitive for arrays). Default severity: gate

t.check(turn.data, equals({ intent: "refund" }));
t.check(turn.data, equals(["a", "b", "c"]));

For partial matching (asserting a subset of keys), use satisfies with a custom predicate instead of equals.

matches

matches(schema: StandardSchema): Assertion

Validates value against a Standard Schema compatible schema. Zod, Valibot, ArkType, and other Standard Schema-compliant libraries work out of the box. Default severity: gate

import { z } from "zod";

t.check(
  turn.data,
  matches(z.object({ intent: z.enum(["refund", "ship", "track"]) })),
);

Returns score 1 if validation passes, 0 if it fails. When the schema produces a parse error, the error message is attached to the recorded assertion for easy debugging.

similarity

similarity(expected: string, opts?: SimilarityOpts): Assertion

Scores value (coerced to a string) against expected using normalized Levenshtein distance, returning a score between 0 (completely different) and 1 (identical). Useful when you want to reward approximate correctness rather than enforce an exact match. Default severity: soft

t.check(t.reply, similarity("The capital of France is Paris.").atLeast(0.8));

Chain .gate() if a minimum similarity must be met for the eval to pass:

t.check(t.reply, similarity(expectedSql).gate().atLeast(0.95));

opts fields:

atLeast

number

Minimum score (0–1) required to pass. Defaults to the threshold set by chaining .atLeast(n) on the returned assertion; if neither is specified the assertion always records the raw score as a soft metric.

satisfies

satisfies(predicate: (value: unknown) => boolean, label?: string): Assertion

Asserts value satisfies an arbitrary predicate function. The optional label string appears in reports and logs to make the assertion’s intent readable. Default severity: gate

t.check(turn.data, satisfies((d) => (d as any).total > 0, "total is positive"));
t.check(t.usage.outputTokens, satisfies((n) => (n as number) < 10_000, "output not verbose"));

The predicate receives the raw value without type coercion — cast as needed inside the function. Return true for pass, false for fail.

Chaining severity

Every matcher returns an Assertion object that exposes .gate() and .atLeast(0.7) methods, letting you override the default severity inline:

// Downgrade an includes assertion from gate to soft:
t.check(t.reply, includes("optional detail").atLeast(0.7));

// Upgrade a similarity assertion from soft to gate:
t.check(t.reply, similarity(expected).gate());

Severity controls how a failure affects the eval outcome:

Severity	Failure effect	Strict mode (`--strict`)
`gate`	Eval is `failed`	Same
`soft`	Eval is `passed` (not `failed`)	Eval is `failed`

Use gate for correctness requirements and soft for quality metrics you want to track but not block on by default.

The Assertion type

An Assertion is a scoring function with attached metadata:

type Assertion = {
  (value: unknown): number | Promise<number>;
  name: string;
  severity: "gate" | "soft";
  gate(): Assertion;
  soft(): Assertion;
  atLeast(threshold: number): Assertion;  // available on similarity
};

Scores are normalized to [0, 1]:

1 — fully passing
0 — fully failing
Values between 0 and 1 — partial credit (primarily used by similarity and judge assertions)

For gate assertions, any score below 1 is treated as a failure. For soft assertions, scores are recorded and compared against the configured threshold (set via .atLeast(n)).

Custom matchers with makeAssertion

When the built-in matchers don’t cover your use case, create a custom assertion with makeAssertion. Your scoring function receives the raw value and must return a number (synchronously or as a Promise).

import { makeAssertion } from "niceeval/expect";

function jsonValid(): Assertion {
  return makeAssertion({
    name: "jsonValid",
    severity: "gate",
    score: (value) => {
      try {
        JSON.parse(String(value));
        return 1;
      } catch {
        return 0;
      }
    },
  });
}

t.check(t.reply, jsonValid());

name

string

required

A short identifier shown in reports and logs when this assertion fails. Use a descriptive name that makes the failure reason self-evident.

severity

"gate" | "soft"

required

The default severity for this assertion. Can be overridden by callers via .gate() and .atLeast(0.7) chaining, just like built-in matchers.

score

(value: unknown) => number | Promise<number>

required

The scoring function. Receives the raw value and must return a number in [0, 1]. Async scoring functions (e.g. calling an external API) are fully supported.

score: async (value) => {
  const res = await externalGrader.grade(String(value));
  return res.score / 100;
},

Async custom matchers

Custom matchers support async scoring natively. The runner awaits all assertions before computing the final outcome.

function semanticallySimilar(reference: string): Assertion {
  return makeAssertion({
    name: "semanticallySimilar",
    severity: "soft",
    score: async (value) => {
      const embedding1 = await embedText(String(value));
      const embedding2 = await embedText(reference);
      return cosineSimilarity(embedding1, embedding2);
    },
  });
}

t.check(t.reply, semanticallySimilar("A greedy algorithm for interval scheduling.").atLeast(0.75));

Quick reference

Matcher	Default severity	Score type	Best for
`includes(str \| RegExp)`	gate	binary (0 or 1)	Keyword / pattern presence
`equals(expected)`	gate	binary (0 or 1)	Exact value or structural match
`matches(schema)`	gate	binary (0 or 1)	Schema / type validation
`similarity(expected)`	soft	continuous (0–1)	Near-match text, approximate answers
`satisfies(predicate, label?)`	gate	binary (0 or 1)	Arbitrary logic
`makeAssertion({ name, severity, score })`	configurable	continuous (0–1)	Any custom requirement

​How matchers are used

​Matchers

​includes

​equals

​matches

​similarity

​satisfies

​Chaining severity

​The Assertion type

​Custom matchers with makeAssertion

​Async custom matchers

​Quick reference

How matchers are used

Matchers

includes

equals

matches

similarity

satisfies

Chaining severity

The Assertion type

Custom matchers with makeAssertion

Async custom matchers

Quick reference