Open source · MIT licensed · v0.1.0

The pytest for LLMs

Test your AI outputs like you test your code.

OpenAI
Claude
Ollama
DeepSeek
Gemini
Mistral
test_capitals.py
from llmtest import expect, llm_test

@llm_test(
    expect.contains("Paris"),
    expect.latency_under(2000),
    expect.cost_under(0.001),
    model="claude-sonnet-4-20250514",
)
def test_capital(llm):
    output = llm("What is the capital of France?")
    assert "Paris" in output.content
Terminal
Waiting...

Everything you need to test LLMs

No LLM judge. No YAML configs. Just pytest.

Zero LLM Calls

Most assertions are deterministic and instant. No paying an LLM to judge your output.

Built on Pydantic

All models use BaseModel — auto-validation, JSON serialization, schema generation.

22+ Assertions

Text, performance, agent, and composable. Contains, regex, JSON, cost, latency, tool calls.

Multi-Provider

OpenAI, Anthropic, Ollama out of the box. Install only what you need.

Agent Testing

Tool call validation, loop detection, call ordering. Test your AI agents properly.

Retry Support

Built-in retry at decorator and fixture level. Handle non-deterministic outputs.

22+ built-in assertions

All deterministic. All instant. No LLM calls needed for most checks.

Textcontains, regex, JSON, length, similarity, structured output
Performancelatency, cost, token count
Agenttool calls, loop detection, call ordering
ComposableAND, OR, custom logic with & and | operators
# Text
expect.contains("Paris")
expect.matches_regex(r"\d+")
expect.valid_json()
expect.structured_output(MyModel)
# Performance
expect.latency_under(2000)
expect.cost_under(0.01)
# Agent
expect.tool_called("search")
expect.no_loop()
# Composable
expect.contains("A") & expect.not_contains("B")

Start testing your LLMs today

Open source. MIT licensed. Built for developers.