Alida Nemenyi
Co-founder · ML leadSpent six years at Anthropic on RLHF and the constitutional-AI workstream, then two years leading evaluation at a stealth-stage robotics company. PhD in machine learning, Berkeley, 2017. Cycles too far on weekends.
// company · deliberately small
We stay small so the people selling the work are close to the people carrying the pager. The company is built around named senior ownership.
// founders
Spent six years at Anthropic on RLHF and the constitutional-AI workstream, then two years leading evaluation at a stealth-stage robotics company. PhD in machine learning, Berkeley, 2017. Cycles too far on weekends.
Eleven years at Stripe and DeepMind on observability, gateways, and ML deployment. Wrote the model-routing layer that handled the first $1B/yr of programmatic decisions at his last company. Doesn't tweet; will write you a long, considered email.
// team
// company principles
01
If we can't measure 'better' before we start, we don't start. The eval suite is the spec; the prompt is downstream of it. Every engagement begins with a measurement surface and ends with one stronger than we found.
02
There is no junior pyramid behind us. The people you meet in discovery are the people writing the code. We turn down work that would force us to staff it any other way.
03
Before week one, we sign a document that says when we're done. We are allergic to engagements that drift into permanence. The strongest signal of a good engagement is that it ends.
04
We work on systems where being wrong has a cost. We turn down marketing-content pipelines, demo-grade agents, and anything where the AI is the product rather than the means. There are excellent firms for that work; we are not them.
05
If a project should not happen, we say so during discovery. If a system isn't ready, we don't ship it. If the timeline has slipped, the client knows by Friday afternoon, not Monday morning.
06
Most of what we ship is unflashy: a gateway, a scorer, a runbook. Cleverness is for the parts of the system where it earns its place. Everywhere else, boring code that the on-call rotation can read at 3 a.m. is the right answer.
// origin
2023 · Q1
First engagement: a 6-week audit for a Series-B fintech. Found 22 issues. Got referred to two more clients.
2023 · Q3
Decided this was the engagement shape worth scaling.
2024 · Q1
Said no to twice as much work as we said yes to. Got more disciplined about doing so.
2024 · Q4
First recurring Embed retainer.
2025 · Q3
Working on whether and how to grow without breaking the model.
// hiring
10+ years / SF or Berlin · in-person 2 days/wk
Production experience with RAG, agents, or fine-tuning; written eval suites for systems you cared about. Bias to people who've shipped to regulated environments.
8+ years / Remote within ±3 hours of SF or Berlin
You've owned a model gateway, a tracing system, or both. You think guardrails belong in code, not PDFs. You've been on call.
Open / SF · in-person 3 days/wk
Specialist role. You think hard about what 'correct' means and you write the assertions that catch the unhappy path. Linguistics, philosophy, or red-team backgrounds welcome.