Gargantua · data for frontier labs

Research-grade data for frontier multimodal AI.

Gargantua is a boutique data partner for labs training video-language and multimodal reasoning models. We work backwards from the benchmarks you need to move — sourcing data your competitors can't find, and designing annotation methodologies that produce signal your model can actually learn from.

Book an eval review Read our methodology

Built by ex-Google/YouTube technologists. Anchor client: a video AI lab backed by NVIDIA and Amazon with over $200M raised.

The problem

Your model is data-bound. Your vendor is a throughput business.

Large labeling platforms are built to execute a spec at volume. But at the frontier, the spec is the hard part. If your annotation methodology is wrong, a million perfectly consistent labels are a million consistent mistakes — and you find out an eval cycle later.

The failure mode is familiar to anyone who has audited reasoning-trace data: ask an annotator to count objects and show their work, and they'll write "3 + 2 + 4" — after counting one by one. That's not a reasoning trace; it's a post-hoc rationalization. A model trained on it learns to fake its reasoning too.

Frontier labs don't need more throughput. They need a partner who understands what makes data learnable — and who treats the methodology as the deliverable, not a line in your requirements doc.

// audit: reasoning-trace sample #04117

task : count the athletes visible in frame

trace : "3 + 2 + 4 = 9"

observed: annotator counted one-by-one,

wrote the arithmetic afterwards

verdict : post-hoc rationalization — not a trace.

a model trained on this learns to fake its reasoning.

fig. 01 — the failure mode a throughput vendor never catches

What we do

Three capabilities, one loop.

fig. 02 — everything works backwards from your evals

capability / strategy

Eval-driven data strategy

We start every engagement with one question: which benchmarks are you trying to move? Then we work backwards — gap analysis between your training distribution and your target evals, dataset design (scale, modality mix, prompt/response/trace structure), and contamination discipline so your training data never touches your eval sets.

capability / acquisition

Acquisition & licensing

Commodity supply is a solved problem; exclusive supply isn't. We build direct relationships with content owners — sports organizations, footage networks, specialist archives — and handle the full commercial layer: licensing agreements, chain of title, IP assignment. Phased sourcing: accessible content first, direct owners next, exclusive regional content last. The result is data your competitors can't scrape and your incumbent vendor can't source.

capability / annotation

Annotation methodology & managed execution

We design the annotation methodology with your research team — elicitation protocols, ontology, edge-case rules, QA statistics — then execute with trained, managed annotation teams on our proprietary platform. Interleaved spatial (points, bounding boxes, segmentation) and text annotation, temporal event structure, reasoning-trace capture. You get research-grade data with the audit trail to prove it.

Methodology

What “research-grade” means in practice.

method / elicitation

Reasoning traces that models can learn from.

The hardest problem in reasoning-trace data isn't labeling — it's eliciting the actual thought process. Annotators take shortcuts, then rationalize. We design elicitation protocols that close that gap: structured follow-up questions (“how did you verify that?”), fast-thinking vs. slow-thinking task variants, estimate-then-verify workflows, and spatial annotations tied to each reasoning step — so the trace records how the problem was actually solved, in a format that trains.

method / interleaving

Interleaved visual-text data, built to spec.

Task prompt → visual content → reasoning trace → answer, with multiple prompts per asset where density matters. Points, boxes, and segmentations embedded in the reasoning chain, not bolted on afterward. Delivered in your schema.

method / calibration

Annotation depth you can defend statistically.

How many entities per clip is enough? We answer that empirically instead of by convention: exhaustively annotate a calibration sample, measure the coverage distribution, and compute confidence intervals on annotation depth — so you buy exactly as much annotation as your evals require, with error bars.

method / qa

QA as an exception process, not a checklist.

Single-annotate-then-review with exception-based flagging, domain-specific issue taxonomies, and reviewer correction rate as the standing diagnostic. Low annotator friction, high signal telemetry, no rubber-stamp dual-annotation theater.

method / hygiene

Eval hygiene.

We treat contamination as a first-class constraint: sourcing and annotation pipelines are designed so training data stays provably disjoint from public benchmarks and your held-out sets.

Proof

Where this has worked.

Our anchor client is a video AI lab backed by NVIDIA and Amazon, with over $200M raised.

backers : NVIDIA · Amazon

raised : $200M+

domains : sports · public safety · news

—
Source and license video training data across sports, public safety, and news — including exclusive content unavailable through their existing vendors.
—
Co-designed the canonical annotation methodology adopted by their science team, including entity-salience rules and edge-case taxonomy.
—
Built the statistical calibration framework that determines annotation depth with empirical coverage guarantees.
—
Run managed annotation at production scale on our platform, with QA telemetry reported every delivery.

We keep client names confidential by default — and we'll extend the same discretion to you.

How we engage

Start small.
Prove signal.
Then scale.

01

Eval review

A working session on the benchmarks you're trying to move and where your current data falls short. No charge; the questions are the demo.

02

Pilot sprint

1–2 weeks, fixed scope. We annotate a golden set (yours or one we source), and deliver annotated data, a methodology memo, and a QA report against success criteria we agree on upfront.

03

Production

Per-asset pricing, QA SLAs, methodology iteration included.

04

Embedded partnership

Ongoing data strategy, a standing sourcing pipeline, and reserved annotation capacity.

Who we are

Technologists first.

Gargantua was founded by Nick Kim (ex-Google/YouTube), and the team includes data scientists and ML engineers with Google and YouTube backgrounds — people who have built and evaluated large-scale ML systems, not resold labor. Annotation execution runs through trained, managed teams under our QA methodology and platform, so senior judgment sets the spec and disciplined operations deliver the volume.

We are deliberately boutique. We take a small number of lab partners at a time, and we expect to be judged on whether your benchmarks move.

FAQ

Common questions.

q.01 Do you sell off-the-shelf datasets?

No. Everything is custom — sourced, licensed, and annotated against your evals. Off-the-shelf is how your training distribution ends up identical to your competitor's.

q.02 Can you handle spatial annotation — points, boxes, segmentation?

Yes, on our own platform, and interleaved with text and temporal annotation rather than delivered as separate layers. Custom ontologies per project.

q.03 What scale can you handle?

Pilot sets from hundreds of assets; production in the thousands to tens of thousands. We scale after the methodology is proven — that's the correct order of operations.

q.04 Who owns the IP?

You do. Full assignment with clean chain of title through every licensing and annotation agreement — that's part of why the contracting layer matters.

q.05 Do you work with data we provide?

Yes. Sourcing and annotation are decoupled — bring your own data and use us for methodology + annotation, or use us end-to-end.

Contact

Tell us which benchmarks you need to move.

If you're training multimodal models and your bottleneck is data quality, licensing access, or annotation methodology — let's talk.

Book an eval review