When classical wins: honest benchmark results from Qtangl

Our committed BM-001–BM-006 benchmarks include failures. Here is when CP-SAT beats QAOA, and why we publish both.

We publish the full table

Qtangl's backend README benchmark table is not marketing filler. BM-001 through BM-006 have committed JSON results in backend/benchmarks/results/, and CI re-runs BM-001, BM-003, and BM-006 on every pull request.

The five-task precedence instance (BM-002) is the clearest example: classical CP-SAT finishes in milliseconds with a feasible plan; QAOA on the same instance takes seconds and fails on the simulator at current qubit limits.

That is not embarrassing — it is the honest baseline buyers and investors should see before anyone claims quantum speedup.

What hybrid is for

Hybrid value on BM-003 (hospital call-out) is not wall-clock time. Classical CP-SAT solves the ward board in ~0.02s. Hybrid fixture replay takes ~6s but surfaces three distinct feasible nurse alternates within the repair window, with audit packs attached.

Track C2 defines success as: at least one more distinct feasible plan than classical, while staying within 2% of the classical objective. BM-003 meets that bar on fixture replay — that is the product claim we can defend.

How to read the scoreboard

Every vertical demo exposes distinctFeasiblePlans and diversityScore on the hybrid column. When hybrid ties classical on objective — which is common — the scoreboard says so.

We lead with alternates and auditability, not latency. If your buyer only cares about milliseconds, classical already wins and we tell them that on the first slide.

Continue reading

Review the benchmark scoreboard, open the hospital demo, or read the hospital restaffing deep dive.