Radical transparency · the suite is public
The Eval Suite
Every engine method, graded on a real run against its documented verified
value — and the results published, green or red. The bands come from the platform's
verified-results record; the JSON is machine-written by
evals/run-evals.mjs and never hand-edited.
A red row is information, not embarrassment.
loading evals.json…
How to read this.
Sync rows run on the public Cloud Run engine (the same endpoint you can call). Async rows
(KSG transfer entropy and its sensitivity sweep) run on the lab's tower through the job manager —
when a run isn't available they show an honest async pending, never a pretend pass.
Methods whose verified record doesn't pin every parameter are graded on stated invariants
(stability, stationarity verdicts, graph shape) — those checks can still fail. Robustness
badges for the published headline results live in the
Phase-31 closure record.