TypeScript SDK
YAML remains AgentV’s canonical, portable eval format. The SDK surfaces below are for cases where you want to generate YAML-shaped definitions in code, embed eval runs inside another application, or write executable graders and prompt templates.
AgentV currently provides two npm packages for programmatic use:
@agentv/sdk— YAML-aligned eval authoring, custom assertions, and code graders@agentv/core— programmatic evaluation API and typed configuration
Installation
Section titled “Installation”# Assertion SDK (defineAssertion, defineCodeGrader)npm install @agentv/sdk
# Programmatic API (evaluate, defineConfig)npm install @agentv/coreChoose a Surface
Section titled “Choose a Surface”Use the simplest surface that matches the job:
- YAML / JSONL first for portable eval specs you want to run from the CLI, check into a repo, or share across TypeScript and Python workflows.
defineEval()/evalSuite()when you want a.eval.tsfile that mirrors YAML concepts and lowers back to the canonical snake_case contract.evaluate({ specFile })when you want library control around an existing YAML suite.- Inline
evaluate({ tests })when the eval definition truly belongs inside application code. The programmatic API mirrors YAML, but uses current TypeScript naming such asexpectedOutputandassert. defineAssertion/defineCodeGraderwhen the grading logic itself must execute code.
There is no separate first-party Python authoring SDK today. Python-facing workflows should either emit canonical YAML/JSONL or implement executable graders that consume the standard snake_case wire format.
YAML-Aligned .eval.ts Authoring
Section titled “YAML-Aligned .eval.ts Authoring”Use defineEval() from @agentv/sdk when you want TypeScript ergonomics without creating a second eval vocabulary. The helper keeps authoring in camelCase where TypeScript needs it, then lowers back to the canonical snake_case eval object contract when AgentV loads the file.
import { defineEval, graders } from '@agentv/sdk';
export default defineEval({ name: 'hello-suite', execution: { targets: ['mock-sdk'], }, workspace: { hooks: { beforeAll: { command: ['echo', 'suite-start'], }, }, }, tests: [ { id: 'hello', input: 'Say hello', inputFiles: ['../fixtures/per-test-note.md'], expectedOutput: 'Hello from the mock target', assertions: [graders.contains('Hello')], }, ],});Useful companion helpers:
toEvalYamlObject()returns the canonical snake_case object.serializeEvalYaml()returns YAML text using the same canonical field names.
The durable field remains assertions. This helper does not introduce a second YAML vocabulary.
Built-In Grader Helpers
Section titled “Built-In Grader Helpers”@agentv/sdk includes a small graders catalog for common deterministic and LLM-backed grader configs. These helpers return ordinary assertions entries and serialize to the same canonical YAML you could write by hand.
import { defineEval, graders } from '@agentv/sdk';
export default defineEval({ name: 'grader-helper-suite', tests: [ { id: 'json-greeting', input: 'Return a JSON greeting.', assertions: [ graders.contains('Hello', { name: 'mentions-hello' }), graders.exact('{"message":"Hello"}', { name: 'exact-json', minScore: 1 }), graders.regex(/"message"\s*:/, { name: 'message-key' }), graders.json({ name: 'valid-json', required: true }), graders.rubrics(['Greets the user'], { name: 'rubric-review' }), graders.llmGrader({ name: 'llm-review', prompt: 'Grade whether the answer is useful.', target: 'grader-target', }), graders.codeGrader(['bun', 'run', 'graders/check.ts'], { name: 'scripted-check' }), ], }, ],});The catalog covers contains, equals/exact, regex, is-json/json, rubrics, llm-grader, and code-grader. CamelCase SDK options such as minScore, maxSteps, and rubric scoreRanges lower to min_score, max_steps, and score_ranges when AgentV loads or serializes the suite.
Custom Assertions
Section titled “Custom Assertions”Use defineAssertion from @agentv/sdk to create reusable assertion types. Place them in .agentv/assertions/ — they’re auto-discovered by filename.
Pass/Fail Pattern
Section titled “Pass/Fail Pattern”import { defineAssertion } from '@agentv/sdk';
export default defineAssertion(({ output }) => { const wordCount = (output ?? '').trim().split(/\s+/).filter(Boolean).length; const pass = wordCount >= 3; return { pass, assertions: [{ text: `Output has ${wordCount} words`, passed: pass }], };});Score Pattern
Section titled “Score Pattern”Return a score (0–1) instead of pass for graded evaluation:
import { defineAssertion } from '@agentv/sdk';
export default defineAssertion(({ output, traceSummary }) => { const hasContent = (output ?? '').length > 0 ? 0.5 : 0; const isEfficient = (traceSummary?.eventCount ?? 0) <= 10 ? 0.5 : 0; return { score: hasContent + isEfficient, reasoning: 'Checks content exists and is efficient', };});If only pass is given, score is 1 (pass) or 0 (fail).
Using in YAML
Section titled “Using in YAML”Convention-based discovery maps filename → assertion type:
.agentv/assertions/word-count.ts → type: word-count.agentv/assertions/sentiment.ts → type: sentimentReference directly in your eval file — no command: needed:
assertions: - type: word-count - type: contains value: "Hello"Code Graders
Section titled “Code Graders”Use defineCodeGrader from @agentv/sdk for full control over scoring with an explicit assertions array:
import { defineCodeGrader } from '@agentv/sdk';
export default defineCodeGrader(({ output, traceSummary }) => ({ score: (output ?? '').length > 0 && (traceSummary?.eventCount ?? 0) <= 5 ? 1.0 : 0.5, assertions: [ { text: 'Answer is not empty', passed: (output ?? '').length > 0 }, { text: 'Efficient tool usage', passed: (traceSummary?.eventCount ?? 0) <= 5 }, ],}));defineCodeGrader graders are referenced in YAML with type: code-grader and command: [bun, run, grader.ts]. defineAssertion uses convention-based discovery instead — just place in .agentv/assertions/ and reference by name.
For detailed patterns, input/output contracts, and language-agnostic examples, see Code Graders.
Wire Format vs SDK Format
Section titled “Wire Format vs SDK Format”Raw grader stdin uses snake_case because it crosses a process boundary and may be consumed by Python, shell, jq, or external dashboards. The @agentv/sdk package converts that payload to idiomatic TypeScript camelCase before calling your handler.
| Raw stdin | SDK handler field |
|---|---|
expected_output | expectedOutput |
output_path | outputPath |
trace_summary | traceSummary |
token_usage | tokenUsage |
cost_usd | costUsd |
duration_ms | durationMs |
workspace_path | workspacePath |
output is already the final answer string in both formats. Transcript-aware code should read messages, trace.messages, or trace.events; answer-text graders should read output.
Programmatic API
Section titled “Programmatic API”Use evaluate() from @agentv/core to run evaluations as a library. The most portable pattern is still to keep the suite in YAML and point specFile at it; inline tests are best when the eval is tightly coupled to application code.
Inline Test Definitions
Section titled “Inline Test Definitions”import { evaluate } from '@agentv/core';
const { results, summary } = await evaluate({ tests: [ { id: 'greeting', input: 'Say hello', expectedOutput: 'Hello there!', assert: [{ type: 'contains', value: 'Hello' }], }, ],});
console.log(`${summary.passed}/${summary.total} passed`);Auto-discovers the default target from .agentv/targets.yaml and .env credentials.
File-Based via specFile
Section titled “File-Based via specFile”Point to an existing YAML eval instead of inlining tests:
import { evaluate } from '@agentv/core';
const { results, summary } = await evaluate({ specFile: './evals/my-eval.eval.yaml',});This is the recommended bridge when you want SDK control without creating a separate code-first eval surface.
Typed Configuration
Section titled “Typed Configuration”Create agentv.config.ts at your project root for type-safe, validated configuration using defineConfig() from @agentv/core:
import { defineConfig } from '@agentv/core';
export default defineConfig({ execution: { workers: 5, maxRetries: 2, verbose: true, otelFile: '.agentv/results/otel-{timestamp}.json', }, output: { dir: './results' }, limits: { maxCostUsd: 10.0 },});The config file is auto-discovered by the CLI from your project root and validated with Zod at startup.
Observability Export
Section titled “Observability Export”AgentV’s observability surface is OpenTelemetry. For post-run workflows:
- Use
agentv eval ... --otel-file traces/eval.otlp.jsonto write OTLP JSON you can import into systems such as Opik. - Use
agentv eval ... --export-otel --otel-backend <name>for live export when a built-in or local resolver exists.
AgentV does not currently ship a dedicated Opik authoring facade or built-in opik backend resolver. Keep the eval definition in YAML and route observability through OTLP export.
Scaffold Commands
Section titled “Scaffold Commands”Bootstrap new assertions and eval files from the CLI:
# Create a new assertion typeagentv create assertion <name> # → .agentv/assertions/<name>.ts
# Create a new eval with test casesagentv create eval <name> # → evals/<name>.eval.yaml + .cases.jsonl