Booking Q1 2027 Start a conversation

// approach · operating model

How we build AI systems when being wrong has a cost.

The approach is deliberately opinionated: eval-first, senior-led, production-connected, and written down before the first prompt lands in code.

// 01 · how we think

Principles we use as constraints.

01

Eval-first. Always.

Before we write production code, we write the test suite that defines 'done.' If we can't agree on what good looks like — measurably — we don't agree on a build.

  • Every engagement begins with an evaluation harness.
  • Eval suites become regression suites at handoff.
  • Where regulators are involved, the eval suite doubles as their artifact.

02

Senior on every PR. No junior pyramid.

The person who runs your kickoff is the person who reviews your PRs. We are deliberately small so this is structurally possible.

  • Median tenure on team: 9 years.
  • We do not subcontract.
  • We do not rotate engineers off the engagement to fund another sale.

03

Production access or no engagement. Synthetic data is not enough.

We work on your real traffic — under DPA, in your VPC, with audit. Lab work that doesn't touch production is theatre.

  • Read-only production access by week two.
  • Shadow deployment before any user-visible release.
  • We sign whatever DPA / BAA / NDA your team requires before discovery.

04

Written exit criteria. Decided in week one.

How we leave is decided at kickoff, not in the final invoice. Every engagement has a written set of conditions that mean we are done — and a handoff plan tied to them.

  • Exit criteria signed by both sides before contracts close.
  • 30 / 60 / 90-day aftercare ladder.
  • Source, evals, and runbook delivered in your tooling — not ours.

05

Fail closed. By default. Always.

Every system we ship has a deterministic safety boundary. When the model is uncertain, the system stops; it does not invent a confident answer.

  • Confidence-bound circuit breakers shipped on every agent.
  • Kill-switch is a first-class feature, not a future ticket.
  • Defaulting to human escalation is not a regression.

06

Boring on purpose. Novelty is a tax.

We will pick the well-understood approach over the novel one almost every time. Production AI loses to operational complexity, not to a missing capability.

  • We use one model gateway, two infra patterns, three orchestration libraries.
  • We document the boring choice and why we made it.
  • We refuse engagements pitched as 'cutting-edge.'

// 02 · delivery model

Small named teams, not a rotating bench.

100%

Principal engineer

Owns architecture, eval design, final review. Same person from kickoff to handoff.

100%

Senior ML engineer

Owns the model & evaluation harness. Pairs with your team on every PR.

50–100%

Senior platform engineer

Owns deployment, observability, fail-closed boundaries. Scales with engagement complexity.

20%

Engagement lead

Single point of contact. Runs the rhythm. Doesn't write code; doesn't filter information.

10%

Domain advisor

Vertical specialist (clinician, attorney, MRM lead) on engagements where the domain demands it.

Source

Your repos. We push branches; you own merges.

Evals

Stored in your repo. Built to outlast us.

Comms

Your Slack / Teams. One shared channel; no separate ops channel.

Tracking

Your tracker. We open tickets in Jira / Linear / GitHub — wherever your team lives.

Docs

Your wiki. We write Notion / Confluence / Coda pages alongside code.

Access

Through your IdP. Time-bound credentials, audit-logged, terminated at handoff.

// 03 · how decisions are made

Ownership is explicit.

You

  • Business priorities & sequence
  • Risk acceptance
  • Final go-live approval
  • Vendor selection (model providers, infra)
  • What ships externally and when

Us

  • Eval methodology
  • Architecture choices on the AI surface
  • Code review standards
  • When to escalate a defect to a stop-the-build
  • When we don't know — and saying so

Joint

  • Eval thresholds (we propose; you ratify)
  • Exit criteria (negotiated in week one)
  • Sequence of work each week
  • Scope changes
  • Roll-out & rollback plan

// 04 · risk & ownership

The risk surface is contractual, technical, and operational.

Intellectual property

Code, evals, prompts, and harness configuration: yours. We retain no rights. We do not reuse client-specific artifacts in other engagements.

Privacy & security

We work under your DPA. PHI engagements under BAA. Production access is time-bound, MFA-gated, and audit-logged through your IdP.

NDA & confidentiality

Mutual NDA before discovery. We will publish anonymized case studies only with your written approval; if you decline, we decline.

Insurance

$5M E&O, $2M cyber liability, $1M general. Certificates available before contract signing.

Defect remediation

Any defect against agreed eval thresholds within 30 days of go-live: fixed at no cost, no time-bound. Severity-1 issues at any time: same.

Refunds

If we fail to meet exit criteria due to our team, the engagement is refunded — net of materials. Stated in writing in every SOW.

Subcontractors

None. We do not staff engagements with external contractors. Every PR is reviewed by a named team member on payroll.

AI training data

Your data is never used to train any model — including ours, including any vendor's. This is contractual, not a posture.

// 05 · weekly rhythm

Weekly rhythm.

Mon

Weekly kickoff (30m, both sides)

Eval-suite review

Tue

Build

Build

Wed

Build

Mid-week sync (15m, optional)

Thu

Build

Eval review on the week's PRs

Fri

Demo (45m, your team invites guests)

Written status note in your wiki