Clean-room coding-agent infrastructure

Compare coding agents across DeepSeek, Qwen, Claude, OpenAI, and local models.

OpenCode Harness runs the same coding-agent workflow through a shared agent loop, permissioned tools, MCP extension points, JSONL traces, and reproducible eval suites.

View on GitHub v0.1.0 Release

Explore

Why it exists

Agent demos are easy. Comparing agents is hard.

Coding agents often ship as model-specific demos with hidden prompts, inconsistent tools, and weak audit trails. OpenCode Harness focuses on the runtime and evaluation layer: one task surface, many providers, reproducible traces, and reports you can inspect.

What is inside

Built like agent infrastructure, not a toy prompt wrapper.

Model-neutral runtime

Provider presets for DeepSeek, Qwen, Claude, OpenAI, local OpenAI-compatible endpoints, vLLM, SGLang, Ollama, and mock mode.

Permissioned tools

File reads/writes, search, patches, shell, git diff, repo maps, context packs, todos, and finish events with conservative policy gates.

MCP-compatible extensions

Stdio MCP tools, resources, prompts, per-server approvals, lifecycle diagnostics, and namespace collision handling.

Traceable evals

JSONL traces, provider transcripts, replay summaries, Markdown/HTML reports, terminal trace viewer, and eval dashboard.

Product surface

Run an eval, inspect the trace, publish the dashboard.

OpenCode Harness eval dashboard showing model pass rates and reports — Eval dashboard for comparing runs and failure types.

OpenCode Harness trace viewer showing model responses and tool calls — Trace viewer for auditing model responses, tool calls, and transcripts.

python -m opencode_harness eval examples/mock-suite.json --preset mock --max-steps 2
python -m opencode_harness tui runs/latest.jsonl
python -m opencode_harness dashboard eval-runs --output eval-runs/dashboard.html

Model Labs

One harness, multiple model families.

DeepSeek Qwen Claude OpenAI Local Models

v0.1.0 released

Open-source, packaged, and ready for benchmark runs.

The v0.1.0 release includes wheel and source artifacts, GitHub Actions CI, a release workflow, a model-eval workflow example, provider benchmark guide, changelog, and a reproducible local demo flow.

Download v0.1.0