// company · deliberately small

Twelve senior operators, not a pyramid.

We stay small so the people selling the work are close to the people carrying the pager. The company is built around named senior ownership.

Start a conversation

// founders

The people accountable on day one.

Alida Nemenyi

Co-founder · ML lead

Spent six years at Anthropic on RLHF and the constitutional-AI workstream, then two years leading evaluation at a stealth-stage robotics company. PhD in machine learning, Berkeley, 2017. Cycles too far on weekends.

Kaspar Holm

Co-founder · platform lead

Eleven years at Stripe and DeepMind on observability, gateways, and ML deployment. Wrote the model-routing layer that handled the first $1B/yr of programmatic decisions at his last company. Doesn't tweet; will write you a long, considered email.

// team

The bench is senior and intentionally narrow.

12 people

8 ML / research

3 platform

1 operations

100% named seniors

// company principles

How we choose work.

Eval before prompt.

If we can't measure 'better' before we start, we don't start. The eval suite is the spec; the prompt is downstream of it. Every engagement begins with a measurement surface and ends with one stronger than we found.

Senior on every PR.

There is no junior pyramid behind us. The people you meet in discovery are the people writing the code. We turn down work that would force us to staff it any other way.

Write down the exit.

Before week one, we sign a document that says when we're done. We are allergic to engagements that drift into permanence. The strongest signal of a good engagement is that it ends.

Refuse fashionable problems.

We work on systems where being wrong has a cost. We turn down marketing-content pipelines, demo-grade agents, and anything where the AI is the product rather than the means. There are excellent firms for that work; we are not them.

Tell the truth, on time.

If a project should not happen, we say so during discovery. If a system isn't ready, we don't ship it. If the timeline has slipped, the client knows by Friday afternoon, not Monday morning.

Boring beats clever.

Most of what we ship is unflashy: a gateway, a scorer, a runbook. Cleverness is for the parts of the system where it earns its place. Everywhere else, boring code that the on-call rotation can read at 3 a.m. is the right answer.

// origin

A short operating history.

2023 · Q1

Founded as a two-person shop in a coworking space south of Market.

First engagement: a 6-week audit for a Series-B fintech. Found 22 issues. Got referred to two more clients.

2023 · Q3

First Eval-first build. 14 weeks. The eval suite found a bug in the client's training data on day 9 that nobody had noticed for a year.

Decided this was the engagement shape worth scaling.

2024 · Q1

Hired engineers #3 and #4. Wrote the first version of the principles. Berlin office opened — Kaspar moved.

Said no to twice as much work as we said yes to. Got more disciplined about doing so.

2024 · Q4

Crossed 20 engagements. Published our first technical writing. Got a board-level invitation to present an audit at a public-company governance committee.

First recurring Embed retainer.

2025 · Q3

Team of 12. 31 engagements total. Three referenceable case studies. The bench is full through Q2 of next year.

Working on whether and how to grow without breaking the model.

// hiring

Open roles.

10+ years / SF or Berlin · in-person 2 days/wk

Senior ML engineer

Production experience with RAG, agents, or fine-tuning; written eval suites for systems you cared about. Bias to people who've shipped to regulated environments.

8+ years / Remote within ±3 hours of SF or Berlin

Platform engineer (gateways)

You've owned a model gateway, a tracing system, or both. You think guardrails belong in code, not PDFs. You've been on call.

Open / SF · in-person 3 days/wk

Evaluation engineer

Specialist role. You think hard about what 'correct' means and you write the assertions that catch the unhappy path. Linguistics, philosophy, or red-team backgrounds welcome.