Booking Q1 2027 Start a conversation

// services · what we sell

Three ways to work with us. One bar.

We staff each engagement with senior people, ship behind an evaluation harness, and write down what a successful exit looks like before we start. Pick the shape that matches where you are.

[01] / Most engagements

Eval-first builds

We build the system around the test suite, not the other way around.

A discovery + build engagement that ships a production AI system behind a measurement harness. The eval is the spec — written before the prompt, run on every commit, and handed to your team as the runbook for everything that comes after.

8 – 16 weeks $140K 1 ML lead + 1 platform engineer + part-time PM

Included

  • Two-week discovery with a signed problem-and-success spec
  • Eval harness — regression, adversarial, drift, cost suites
  • System build: agents, fine-tunes, retrieval, gateway, guardrails
  • Observability: tracing, cost dashboards, on-call runbook
  • Two-week supervised handoff with your engineering team
  • 60-day defect-fix window after handoff

Not included

  • Frontend / product UI work outside the AI surface
  • Long-tail data labeling (we partner; we don't staff it)
  • On-call after the 60-day window (move to Embed)
A mid-market lender's underwriting copilot. Twelve weeks. Eval suite of 412 cases gates every deploy.

[02] / Diagnostic

AI audits

Tell us your AI system is misbehaving and we'll tell you why.

A four-week diagnostic on an AI system you've already built. We rebuild your eval surface, instrument your traces, run our adversarial battery, and deliver a written report with prioritized fixes. Useful before a board review, before a re-platform, or after an incident.

3 – 4 weeks $48K 1 ML lead + 1 evaluation engineer

Included

  • Reconstruction of an eval suite from your production traffic
  • Adversarial battery (jailbreak, prompt-injection, PII, cost)
  • Cost & latency analysis with model-routing recommendations
  • Drift report against the last 90 days of traffic
  • Written report (~30 pp) with prioritized fixes and effort estimates
  • Two read-out sessions: technical and executive

Not included

  • Implementation of the recommended fixes (separate engagement)
  • Ongoing monitoring (separate engagement)
A Series-C SaaS asked us to audit a customer-support agent that was quietly returning hallucinated refunds. Three weeks. 14 prioritized fixes; the top 3 closed the regression.

[03] / Ongoing

Embedded teams

Our senior people inside your team for as long as it's useful.

We place an ML lead and one or two engineers inside your team on a monthly retainer. Same Slack, same standups, same review process. Useful when you've built the foundation and need senior throughput, or when you're building an in-house ML team and want a reference for what good looks like.

3 – 12 months $32K / month 1 ML lead minimum; up to 3 engineers depending on scope

Included

  • Named senior people; not a rotating bench
  • Weekly written status; monthly steering review
  • Code review on every PR; architectural sign-off on every system change
  • Eval suite stewardship — we keep your gates honest
  • Two weeks notice on either side; no long lock-ins

Not included

  • Hiring and managing your in-house team (we coach, we don't manage)
  • 24/7 on-call (we cover business hours; on-call is your team)
A logistics platform's in-house ML team of four, with our lead embedded for nine months. They now ship without us, which was the goal.

// compare

The engagement shape should match the risk.

Dimension Eval-first builds AI audits Embedded teams
Engagement length 8–16 weeks 3–4 weeks 3–12 months
Senior headcount 1 lead + 1 eng 1 lead + 1 eng 1–3 engineers
Eval harness Built fresh Reconstructed Stewarded
Production access Required Read-only Required
Code ownership You own it Report only You own it
Best for New systems Existing systems Sustained throughput
Starting at $140K $48K $32K / mo

// questions

Common questions before discovery.

Do you sign NDAs?

Yes, by default. We send a one-page NDA at the start of discovery; we'll also sign yours if it's mutual and reasonable. We do not sign engagement-specific non-competes.

Can we keep the code?

Yes. Every engagement ships under your repos and your IP. We retain the right to write about the methodology in anonymized form; we ask permission before naming you publicly.

Do you work with non-technical leadership?

Often. The discovery week is structured so that a non-technical sponsor leaves it understanding the problem, the constraints, and how we'll know if we've succeeded. We are deliberately allergic to AI theater.

What does a 'serious' engagement look like to you?

A team with a real problem, a real budget, and someone senior enough to make decisions inside the engagement. We turn down work without all three — kindly.

Do you do generative-marketing or content workflows?

We don't. We work on systems where being wrong has a cost — underwriting, procurement, support, clinical workflows, security ops. If your problem is ad copy, you should hire an agency.

Where are you based?

Distributed. Founders in San Francisco and Berlin. We work in your timezone for the duration of the engagement; we travel for the discovery week and at least once per build.