> ## Documentation Index
> Fetch the complete documentation index at: https://niceeval.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Connect a remote or in-process agent to niceeval

> Use defineAgent to wrap any function or HTTP service as a niceeval agent. Map responses to the standard event stream and register the agent by name.

Every system under test in niceeval is an **agent** — a named object that receives a prompt, drives your code or service, and returns a structured result. You connect your own logic by writing an *adapter* with `defineAgent`. The adapter owns all the details of how to call your system: in-process function references, HTTP endpoints, authentication headers, and message formats are entirely private to the adapter. Experiments reference agents directly rather than passing URLs on the CLI, because there is no universal protocol that every agent speaks.

## When to use `defineAgent`

<CardGroup cols={2}>
  <Card title="In-process function" icon="bolt">
    Call your own function directly inside `send`. Zero network overhead — the fastest possible eval loop, ideal for unit-level semantic tests in CI.
  </Card>

  <Card title="Remote HTTP service" icon="globe">
    Issue a `fetch` inside `send` using whatever protocol your service speaks. The URL, auth, and request shape are your business; niceeval never sees them.
  </Card>
</CardGroup>

## The `defineAgent` shape

`defineAgent` accepts a plain object with three fields. The `send` function is the only place you need to write any logic.

```ts theme={null}
import { defineAgent } from "niceeval/adapter";

defineAgent({
  name: string;                          // used for reports and grouping
  capabilities: AgentCapabilities;       // declares what this agent can do
  async send(input: TurnInput, ctx: AgentContext): Promise<Turn>;
});
```

### `AgentCapabilities`

Declaring capabilities lets niceeval shape the `t` context that eval authors receive. If a capability is absent, the corresponding assertions are not available at the type level — you get a compile-time error rather than a runtime surprise.

```ts theme={null}
interface AgentCapabilities {
  conversation?: boolean;       // supports multiple send() calls per eval (t.reply, t.newSession)
  toolObservability?: boolean;  // produces action.* events  →  t.calledTool, t.toolOrder, etc.
  sandbox?: boolean;            // sandbox agents only — do not set it on a remote agent
}
```

<Note>
  The `sandbox` capability — which enables `t.sandbox.diff`, `t.fileChanged`, and related assertions — is only meaningful for **sandbox agents** that run in an isolated filesystem. Remote and in-process agents should declare only `conversation` and `toolObservability`.
</Note>

### `AgentContext`

The runner passes `ctx` into every `send` call. Use `ctx.signal` to respect cancellation, `ctx.model` to forward the experiment's model tier to your agent, and `ctx.flags` to read feature flags defined by the experiment.

```ts theme={null}
interface AgentContext {
  readonly signal: AbortSignal;
  readonly model?: string;                             // set by the experiment; omit → agent's default
  readonly flags: Readonly<Record<string, unknown>>;  // experiment feature flags, forwarded to agent
  readonly sandbox?: Sandbox;                         // sandbox agents only
  readonly session: { id?: string; readonly isNew: boolean };
  log(msg: string): void;
}
```

### `Turn` — what `send` returns

```ts theme={null}
interface Turn {
  readonly events: StreamEvent[];                        // ★ the normalized event stream
  readonly data?: unknown;                               // structured output for outputEquals / outputMatches
  readonly status: "completed" | "failed" | "waiting";  // "waiting" means parked at a HITL prompt
  readonly usage?: Usage;                                // optional token usage
}
```

The `events` array is the heart of every `Turn`. All assertions — `t.calledTool`, `t.messageIncludes`, `t.eventOrder`, and the rest — derive from this single stream. Populating it correctly is the only real job of a remote adapter.

***

## In-process adapter example

Use an in-process adapter when your agent is a TypeScript function you can import directly. There is no network round-trip, and you get full type safety.

```ts theme={null}
// agents/classify.ts
import { defineAgent } from "niceeval/adapter";
import { classifyIntent } from "../src/agent.js";

export default defineAgent({
  name: "classify",
  capabilities: {},
  async send(input, ctx) {
    const result = await classifyIntent(input.text, { signal: ctx.signal });

    return {
      events: [
        { type: "message", role: "assistant", text: result.label },
      ],
      data: result,          // available as turn.data in outputEquals / outputMatches
      status: "completed",
    };
  },
});
```

<Note>
  You don't need to declare `conversation` or `toolObservability` if your function doesn't support them. Omitting a capability simply means the corresponding `t.*` methods won't appear in eval authors' type signatures.
</Note>

***

## Remote HTTP adapter example

When your agent lives behind an HTTP endpoint, `send` is just a `fetch`. The URL comes from an environment variable so you can point the same adapter at local or production without changing any code.

```ts theme={null}
// agents/weather-bot.ts
import { defineAgent } from "niceeval/adapter";

export default defineAgent({
  name: "weather-bot",
  capabilities: { conversation: true, toolObservability: true },
  async send(input, ctx) {
    const r = await fetch(`${process.env.AGENT_URL}/chat`, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ message: input.text }),
      signal: ctx.signal,
    });
    const body = await r.json();

    return {
      events: toStreamEvents(body),   // map your response shape → StreamEvent[]
      data: body.output,
      status: "completed",
    };
  },
});
```

***

## `toStreamEvents` — mapping your response to the standard stream

`toStreamEvents` is a small mapping function you write. Its job is to translate whatever your service returns into the standard `StreamEvent[]` vocabulary that niceeval understands. Here is what the standard types look like:

```ts theme={null}
type StreamEvent =
  | { type: "message"; role: "assistant" | "user"; text: string }
  | { type: "action.called"; callId: string; name: string; input: JsonValue }
  | { type: "action.result"; callId: string; output?: JsonValue;
      status: "completed" | "failed" | "rejected" }
  | { type: "subagent.called"; callId: string; name: string; remoteUrl?: string }
  | { type: "subagent.completed"; callId: string; output?: JsonValue;
      status: "completed" | "failed" }
  | { type: "input.requested"; request: InputRequest }  // agent paused at a HITL prompt
  | { type: "thinking"; text: string }
  | { type: "error"; message: string };
```

A minimal mapper for a service that returns `{ reply: string, tools: ToolCall[] }` might look like this:

```ts theme={null}
function toStreamEvents(body: { reply: string; tools?: any[] }): StreamEvent[] {
  const events: StreamEvent[] = [];

  for (const call of body.tools ?? []) {
    events.push({ type: "action.called", callId: call.id, name: call.name, input: call.input });
    events.push({ type: "action.result", callId: call.id, output: call.result, status: "completed" });
  }

  events.push({ type: "message", role: "assistant", text: body.reply });
  return events;
}
```

<Tip>
  Emit `action.called` and `action.result` pairs with matching `callId` values. niceeval's `deriveRunFacts` stitches them into structured `toolCalls` facts, which power `t.calledTool`, `t.toolOrder`, `t.noFailedActions`, and more.
</Tip>

***

## Referencing an agent from an experiment

Import your adapter from an experiment file so the run configuration is signed in and reviewable.

```ts theme={null}
// experiments/local.ts
import { defineExperiment } from "niceeval";
import classifyAgent from "./agents/classify.js";

export default defineExperiment({
  agent: classifyAgent,
  runs: 1,
});
```

***

## Switching between local and production with environment variables

Because the adapter reads `process.env` internally, you can point it at any environment without touching config files. Pass the variable inline or export it before running:

```shell theme={null}
# Evaluate against your local dev server
MY_BOT_URL=http://localhost:3000 npx niceeval exp local

# Evaluate against production
MY_BOT_URL=https://prod.example.com npx niceeval exp prod
```

You can also create two experiment files — one for local, one for production — and switch between them with `npx niceeval exp local` vs `npx niceeval exp prod`.

***

## Standard `StreamEvent` types at a glance

| Type                 | When to emit                                                                  |
| -------------------- | ----------------------------------------------------------------------------- |
| `message`            | Any text the agent produces (assistant reply or user echo)                    |
| `action.called`      | A tool or skill call is initiated                                             |
| `action.result`      | The result of a tool or skill call (pair with `action.called` via `callId`)   |
| `subagent.called`    | The agent delegates work to a child agent                                     |
| `subagent.completed` | The child agent finishes                                                      |
| `input.requested`    | The agent paused and is waiting for human input (HITL); triggers `t.parked()` |
| `thinking`           | Internal reasoning text (e.g., extended thinking from Claude)                 |
| `error`              | A non-fatal error the agent reported                                          |

<Note>
  Skill loads (`load_skill`) are modeled as `action.called` events with `name: "load_skill"`. The `t.loadedSkill()` assertion is therefore just syntactic sugar over `t.calledTool("load_skill", …)` — no special event type is needed.
</Note>