Regression pass rate
Programs at a frontier AI lab.
DirectionalAnonymized by company type. Specific on the work. The artifact, the metric, and the moment a decision moved — kept together.
Case studies attributed by company type, not by name.
The work, the artifact, the metric that moved.
Public proof first. Private reference path on request.
Three readings from three programs. Attributed by type, not by name.
Programs at a frontier AI lab.
DirectionalReviewer pickup to sign-off, a regulated decisioning program.
DirectionalCaught at the gate across an enterprise rollout.
DirectionalFour programs. Four readings. Attributed by company type — not by name. Deeper review available under a private reference path.
RLHF adjudication and regression capture stayed attached to the same packet. Eval findings compounded into the next release gate without being re-stated.
Uncertain inferences routed back to the radiologist qualified for the body region, with the rubric and override chain attached to the case.
Release review moved from a screenshot debate to one inspectable record per run. Examiner questions get one answer per audit cycle.
Demonstration capture, teleop review, and export-format lineage stayed attached to the same workflow that ships the policy.
“Release review went from a three-week scramble to a repeatable gate. When an edge case slips, we catch it, we save it, and it never ships again.”
Bring the release, review queue, or escaped issue that matters most. We will show how it runs — and what it leaves behind.