AI LABS · ANNOTATION · LABELS WITH A VERDICT ATTACHED

Labels that don't end at delivery.

Labeling alone does not tell you whether a release should ship. Annotation keeps the label, the reviewer note, and the verdict together.

Talk to AI Labs See pricing

RUBRIC

Versioned

The label schema is reviewable and the same on every batch.

REVIEW

Attached

Reviewer notes follow the label out of the workspace.

DATASET

Signed

IAA, reopen rate, and verdict sealed with the export.

HOW IT WORKS

Three steps. One signed dataset.

Define the rubric. Label with review. Sign the dataset before it leaves.

STEP 01

WHAT WE CODIFY

Define the rubric

The label schema, the guidance, and the examples are versioned. Every annotator works against the same definition.

→

STEP 02

WHAT WE CHECK

Label with review

Annotation and review run in the same workspace. IAA, reopen rate, and reviewer notes stay attached to the batch.

→

STEP 03

WHAT WE SIGN

Sign the dataset

Export only when the gate clears. Rubric version, reviewer record, and verdict travel with the dataset.

MODALITIES · SIX SURFACES · ONE WORKFLOW

One workflow for every modality.

Purpose-built tools for each modality, on the same quality path. Masks, timelines, waveforms, cuboids, and spans all attach to the same review record.

01 · IMAGE

Pixel-mask with SAM2 assist

Bounding boxes, polygons, and pixel-mask segmentation with SAM2 assist. Every stroke and every correction stays on the same review record.

02 · VIDEO

Action recognition tracks

Frame-by-frame tracking with timelines and action-recognition segments that reviewers can scrub together. Spans survive review.

03 · AUDIO & SPEECH

Threaded voice comments

Voice comments on segments, diarization, transcripts, and biosignal overlays kept together. The reviewer's voice note lives on the row that earned it.

04 · TEXT & NLP

Spans with the policy attached

Named entity recognition, sentiment, classification, span annotation, and multilingual prompts in one governed workflow.

05 · STRUCTURED

Bulk-edit + taxonomy editor

Hierarchical taxonomies, metadata validation, and bulk edits for tables and schema-driven labels. One source of truth for the schema.

06 · 3D POINT CLOUD

Calibration-aware cuboids

LiDAR and depth-sensor annotation with cuboids, measurement tools, and calibration evidence that travels with the sensor frame.

PLATFORM LIVE METRICS

The platform's live feed, on this page.

Every number below is a snapshot from the platform's live telemetry pipeline — not a designed claim. The same numbers move the rubric, the queue, and the export gate.

IAA MEDIAN · 30D

0.84

Target 0.75 · trend up

REOPEN RATE · 30D

1.8%

Ceiling 2.0% · trend down

EXPORT GATE

26/30

3 blocked · 1 in review

QUEUE DEPTH

6 auto-escalated · 38m median wait

QUALITY READOUT · LAST 30 DAYS

SURFACES · FOUR PILLARS

Four surfaces. One annotation workflow.

The Studio is where the work happens. Review is where it gets cleared. The Quality Hub is where it gets measured. Datasets are where it leaves.

STUDIO

Annotation Studio

Teams work across image, video, text, audio, and LiDAR without relearning the workflow. Autosave and offline queueing keep work intact when a connection drops.

REVIEW

Collaborative adjudication

Disagreements turn into clear decisions with shared context. Voice and video comments stay attached to the task, not scattered in side channels.

QUALITY

Quality Control Hub

IAA, annotator drift, gold-set gaps, and the export gate live in one hub before data leaves the workflow. Threshold rules auto-escalate when metrics drift.

DATASETS

Projects, datasets, exports

Projects, datasets, queue routing, and exports stay tied to the same production workflow. Dataset management and export flows share one source of truth.

COVERAGE READOUT

The rubric, the review, the verdict.

Every batch shows where the rubric was hit, where review caught the disagreement, and where policy held the line. One readout, three readings.

BATCH READOUT · LAST 30 DAYS

WHAT COMES OUT

What your dataset leaves with.

Every batch leaves something the training team can act on — and something the next reviewer can read.

Rubric versions

The label schema and guidance the batch was annotated against, pinned and immutable.

↳ ARTIFACT

Labeled batches

Annotations tied to the rubric version, the annotator, and the time they were made.

↳ ARTIFACT

Review notes

The disagreement, the reasoning, and the resolution stay with the row that triggered them.

↳ ARTIFACT

Signed datasets

Export only after the gate clears. Rubric version and verdict travel inside the package.

↳ ARTIFACT

IAA reports

Inter-annotator agreement, reopen rate, and pass rate for every batch — readable, not buried.

↳ ARTIFACT

EXPORT · SEVEN FORMATS · ONE MANIFEST

Exports leave in the format the training stack expects.

Seven shipped export writers. Every format carries the manifest with it — rubric version, reviewer coverage, gate state, and the checksum on the dataset.

COCO

Instance + keypoint + panoptic JSON for image and video workflows.

YOLO

Bounding-box TXT files per image, class-index manifest included.

YOLO-seg

Polygon + mask coordinates for YOLO segmentation training.

VOC

Pascal VOC XML per image for long-tail legacy pipelines.

LabelMe

LabelMe JSON with shapes, groups, and flags preserved.

JSONL

JSON Lines for streaming and incremental dataset updates.

Parquet

Columnar dataset with schema enforced by the taxonomy editor.

LABELS

42,184 cleared

mask · span · waveform · cuboid

REVIEW DECISIONS

317 adjudicated

final label + reason retained

QUALITY GATE

IAA 0.84

reopen + drift checks pass

QUALITY NOTE · ON THE RECORD

“Adjudication moved from spreadsheets to one review record. IAA is visible before export, and the quality lead can reopen a case without losing context.”
Head of data labelling · a frontier AI lab

DIFFERENTIATORS · DAY ONE

What teams notice on day one.

Six things change as soon as the workflow runs end-to-end. None of them are about labelling speed — they are about the record the labels leave behind.

01 · ADJUDICATION

Disputes leave email

Disputed labels go through adjudication instead of spreadsheets and side threads. Every disagreement turns into a row with a resolution.

02 · AGREEMENT

IAA, calculated continuously

Inter-annotator agreement is not a quarterly report. It runs against every batch, with reviewer-level and project-level views.

03 · REOPEN RATE

Reopens tracked per annotator

Reopen rate per annotator and per project with SLA alerts. A drift in either one fires before the dataset leaves.

04 · GATE

Export blocked when weak

Four checks block a release when the dataset is weak. Gate state and the evidence live on the job — not on a dashboard nobody reads.

05 · AUDIT

One trail per label

Complete audit trail for every label decision. Who labelled it, who reviewed it, what changed, and why the cleared dataset was allowed to leave.

06 · DURABILITY

Offline does not lose work

Autosave and CRDT replay keep work intact when the connection drops. Robotics sessions in the field arrive attached to the same clip record.

OFFLINE · DURABILITY

The connection drops. The work does not.

IndexedDB plus CRDT replay keeps the workspace running when the network goes away. Mask edits, span boundaries, and reviewer notes queue locally and reconcile on reconnect.

Robotics teams labelling in the field, clinical reviewers in a basement reading room, and audio teams in a studio booth — none of them lose work. The conflict resolver merges with the reason on the record.

REPLAY TIMELINE · ALL OPS RECONCILED

00:00

Connected

Autosave committed

00:17

Offline

29 mask edits queued locally

00:46

Conflict

Two reviewers edited the same span

01:04

Replay

CRDT merge accepted · ops reconciled

QUESTIONS · BEFORE YOU SHIP

What teams ask before the dataset leaves.

Six questions the program lead asks before they sign the export. The same six come up across labs, programs, and regulated workflows.

01 · ON THE RUBRIC

What happens when the rubric changes mid-program?

Rubrics are versioned. A change creates a new version, and the program lead approves the move. Existing batches stay on the version they were labelled against — the export carries the version with it.

02 · ON ADJUDICATION

Who decides when annotators disagree?

Adjudication is a defined role with its own queue. Disagreements flow into a single record with both labels, both reasons, and the resolver's call. The decision lives with the row.

03 · ON IAA

How is inter-annotator agreement measured?

Per batch and rolling 30-day, per annotator and per project. The Quality Hub surfaces the trend, the drift, and the threshold against the gate. Reopens and IAA share one source of truth.

04 · ON THE GATE

What blocks an export?

Four checks. IAA below the threshold. Reopen rate above the ceiling. Coverage below the required percentage. Gold-set agreement below the floor. Any one of them blocks the release with the reason attached.

05 · ON DATA

Where does the training stack pick up the dataset?

Through the export manifest. Seven formats are shipped. The manifest carries the rubric version, reviewer coverage, gate state, and the dataset checksum so the training side can verify what it received.

06 · ON MODALITIES

What about modalities we have not labelled before?

Image, video, audio, text, structured, 3D point cloud, and biosignal are all on the same workflow. New modalities ship behind the same review, quality, and export gates as the existing ones.

WHAT CHANGES

What changes when annotation runs on one record.

Annotation breaks when the label, the disagreement, the quality reading, and the export gate live in different systems. The labels arrive. The disputes get emailed. The quality report comes out a week later. And the dataset leaves before anyone has actually looked at it.

On one record, the rubric version is pinned to the batch. The disagreement is a row in adjudication, not a thread in chat. The IAA reads against the gate before the export is even staged. And the dataset only leaves when the gate has cleared — with the reason, the reviewer, and the version still attached.

WHERE IT FITS

In the loop, this is where you review.

Test the run. Review the hard cases. Recruit the right specialist. Remember the misses. Approve what's right.

Test

Review

● YOU ARE HERE

Recruit

Remember

Approve

RELATED MODULES

Next to this in the Human Data OS.

WORKFORCE

The right reviewer. On the right case.

Specialists routed to the rows the rubric flagged.

See the page →

CLEO

An instrument, not a chatbot.

Reads the rubric, the review, and the dataset alongside you.

See the page →

EVALUATION STUDIO

Test it before it ships.

The same rubric that grades a release grades the dataset.

See the page →

ANNOTATION

Labels that don't end at delivery.

Bring the rubric your reviewers already trust. We'll keep it attached to every label, every review, and every dataset that leaves.

Talk to AI Labs See pricing