# niceeval

## Docs

- [niceeval agents and adapters: connect to any AI system](https://niceeval.com/docs/concepts/agents-adapters.md): Learn how niceeval connects to any AI through named agent adapters. Remote agents wrap HTTP or in-process calls; sandbox agents run CLI tools in isolation.
- [Evals in niceeval: lifecycle, outcomes, and eval files](https://niceeval.com/docs/concepts/evals.md): An eval is a single test case: a description, an agent reference, and a test function. Learn how evals are discovered, scheduled, passed, and reported.
- [niceeval architecture: evals, agents, and sandboxes](https://niceeval.com/docs/concepts/overview.md): Understand how niceeval's core, agent adapters, and sandbox backends fit together to evaluate any AI agent with a unified TypeScript API.
- [niceeval scoring: assertions, judge calls, and outcomes](https://niceeval.com/docs/concepts/scoring.md): niceeval scoring: five mechanisms — value assertions, scoped assertions, LLM-as-judge, test-as-scoring, and efficiency checks. Gate vs soft severity.
- [Eval Your AI Agent Application](https://niceeval.com/docs/example/ai-agent-application.md)
- [Eval Your Claude Code / Codex Plugin](https://niceeval.com/docs/example/claude-code-codex-plugin.md)
- [Eval Your Skill on Claude Code / Codex](https://niceeval.com/docs/example/claude-code-codex-skill.md)
- [Authoring evals: single-turn and multi-turn patterns](https://niceeval.com/docs/guides/authoring.md): Learn how to write niceeval evals using defineEval. Cover single-turn assertions, multi-turn conversations, dataset fan-out, and sandbox fixtures.
- [Run niceeval evals in GitHub Actions and CI pipelines](https://niceeval.com/docs/guides/ci-integration.md): Integrate niceeval into GitHub Actions or any CI system. Evals exit non-zero on failure, report JUnit XML, and cache results to speed up repeat runs.
- [Dataset fan-out: run evals across many test cases](https://niceeval.com/docs/guides/dataset-fanout.md): Export an array from a .eval.ts file to fan out into many evals. Use loadYaml or loadJson to drive evals from external datasets with stable IDs.
- [Experiments: compare agents and models with run matrices](https://niceeval.com/docs/guides/experiments.md): Use niceeval experiments to run the same evals across multiple agents, models, and feature flags. Measure pass rates, cost, and latency side by side.
- [Sandbox fixtures: evaluate coding agents with tasks](https://niceeval.com/docs/guides/fixtures.md): A fixture is a directory with PROMPT.md and EVAL.ts that niceeval uses to run a coding agent in isolation and validate its output with Vitest tests.
- [Connect a remote or in-process agent to niceeval](https://niceeval.com/docs/guides/remote-agent.md): Use defineAgent to wrap any function or HTTP service as a niceeval agent. Map responses to the standard event stream and register the agent by name.
- [How the niceeval runner schedules and executes evals](https://niceeval.com/docs/guides/runner.md): The niceeval runner discovers evals, schedules them with bounded concurrency, caches results, retries flaky infrastructure, and enforces budget guardrails.
- [Sandbox agents: evaluate Claude Code, Codex, and bub](https://niceeval.com/docs/guides/sandbox-agent.md): Use niceeval's built-in agents (claude-code, codex, bub) or write a custom adapter to run a coding-agent CLI in an isolated Docker or cloud sandbox.
- [Sandbox backends: Docker, Vercel, and third-party](https://niceeval.com/docs/guides/sandbox-backends.md): niceeval runs coding agents in Docker or Vercel sandboxes. Learn how to select a backend, configure root access, and improve performance with warm pools.
- [Scoring guide: assertions, judge, and cost limits](https://niceeval.com/docs/guides/scoring-guide.md): Use niceeval's five scoring mechanisms — value assertions, scoped assertions, LLM-as-judge, test-as-scoring, and efficiency checks — to grade any eval.
- [Viewing niceeval results and debugging agent behavior](https://niceeval.com/docs/guides/viewing-results.md): niceeval stores structured artifacts in .niceeval/ after every run. Use npx niceeval view to explore transcripts, diffs, event streams, and pass rates.
- [Install niceeval: setup, scaffolding, and configuration](https://niceeval.com/docs/installation.md): Install niceeval as a dev dependency, scaffold evals/ and niceeval.config.ts with npx niceeval init, and configure your agent credentials.
- [niceeval: TypeScript eval framework for AI agents and LLMs](https://niceeval.com/docs/introduction.md): niceeval is a TypeScript eval library for AI agents. Evaluate coding agents, HTTP services, and in-process functions with one unified API.
- [niceeval quickstart: run your first eval in 10 minutes](https://niceeval.com/docs/quickstart.md): Install niceeval, scaffold your project, and run your first three evals — function, conversational, and coding-agent — in under 10 minutes.
- [niceeval CLI: commands, flags, and exit codes reference](https://niceeval.com/docs/reference/cli.md): Complete niceeval CLI reference: exp, init, list, clean, and view commands. Covers experiment selection, eval filtering, sandbox, concurrency, budget, and JUnit CI output.
- [niceeval.config.ts](https://niceeval.com/docs/reference/configuration.md): Project configuration reference for niceeval.
- [defineAgent and defineSandboxAgent: adapter reference](https://niceeval.com/docs/reference/define-agent.md): Reference for defineAgent and defineSandboxAgent. Covers AgentContext, Sandbox interface, StreamEvent types, and shared sandbox helpers.
- [defineConfig: configure project defaults](https://niceeval.com/docs/reference/define-config.md): Reference for project-wide niceeval defaults: judge, reporters, concurrency, timeout, and sandbox backend.
- [defineEval: declare, configure, and run evals in niceeval](https://niceeval.com/docs/reference/define-eval.md): Complete reference for defineEval and defineAgentEval. Covers all options, the test context t, Turn return value, and dataset array exports.
- [niceeval/expect matchers and custom assertion reference](https://niceeval.com/docs/reference/expect.md): Reference for niceeval/expect: includes, equals, matches, similarity, satisfies. Chain .gate() or .atLeast(0.7), and build custom matchers with makeAssertion.