W
Wendell
Private Demo
Test Type

Select Agent

Choose what Wendell should evaluate. Start with built-in baselines, connect a custom agent endpoint, or run through Pi as an agent harness.

Careful Baseline
Follows policies strictly, always verifies accounts first, and escalates when unsure. May be slower but rarely makes critical errors.
Policy-First
Risky Baseline
Optimizes for speed and customer satisfaction. Sometimes skips verification or makes policy exceptions to resolve issues faster.
Speed-First
External
Custom Agent Endpoint
Bring-your-own-agent placeholder. Represents a hosted agent connected through HTTP, SDK, or command adapter.
AdapterHTTP / SDK
External
Pi Harness
External agent harness adapter. Connects to Pi's agent infrastructure for evaluation.
AdapterPi
Bring your own agent
Wendell is agent-agnostic. Connect a model, hosted agent, local harness, browser agent, or tool runtime.
Planned

HTTP Endpoint

Production agents behind an API

Wendell sends each scenario observation to your hosted agent and records the response, tool calls, and latency.

POST /respond → { message, tool_calls }
Planned

SDK

CI/eval pipelines and internal platforms

Embed Wendell directly in Python or TypeScript and evaluate your own agent function against scenario suites.

wendell.evaluate({ agent, scenarios })
Available

CLI / Command

Local agents, prototypes, LangChain, CrewAI

Any command that accepts JSON on stdin and prints an agent response can be tested by Wendell.

--agent-command "python my_agent.py"
Planned

Browser Target

Browser/UI agents

Wendell presents a simulated customer or workflow page that browser agents interact with like a real app.

Launch target URL + session token
Planned

MCP / Tool Server

Tool-using agents and desktop assistants

Expose simulations as tools: start, observe, send message, call tool, finish run, and get report.

start_simulation, observe, finish_run
Planned

Webhook Events

Observability, dashboards, QA systems

Stream scenario, trajectory, assessment, and critical-failure events into your own systems.

scenario.completed, assessment.completed