Honest result
K=3 adversarial verification cut false positives 27.8% to 0.0% (95% CI [11.1, 50.0] to [0, 0]; recall 100% to 77.8%) on a 36-snippet labeled set including prompt-injection traps. Held-out real target: 3/3 genuine bugs found, 0 surviving false positives. Cost-routing claim is operator-gated on an Anthropic key. Reported honestly as "harness committed, live multi-tier number gated."
(95% CI [11.1, 50.0] to [0, 0])
0 surviving false positives
concurrency cap 8
make eval-dry reproduces offline
Quorum fans out a finder agent per source file, then routes each finding to K skeptic agents that independently try to disprove it. Only findings that survive all K challenges ship. The result: a sharp precision improvement at a moderate recall trade-off, on a benchmark that includes adversarial prompt- injection traps to catch sycophantic verification.
The trace UI is a real deployed product at quorum.thomaspeng.ca, browsable below. Model routing is structured as a cost-first 4-tier ladder: DeepSeek for extraction, Haiku for mechanical checks, Sonnet for judgment, Opus only when convergence requires it.