// production AI · est. 2023 · booking Q1 2027

A senior consultancy for production AI where being wrong has a cost.

Twelve people. Six sectors. One bar across all of them. We build evaluation-first systems for banks, hospitals, law firms, and operators where a hallucination is a regulatory event, not a UX bug.

Start a conversation Read the case studies

47 engagements shipped

$2.1B decisions automated

99.94% uptime, last 12mo

11 F500 in roster

ARCWAY NORTHFIELD TENPOINT MERIDIAN/CO HARBOR.LABS STRATUM OAKLINE VERTEX&SONS

// 01 · what makes us different

Four commitments we won't negotiate on.

P-01

Evaluation before code

No model touches production traffic before the eval set is signed off by the business owner.

P-02

Fail-closed by default

Systems escalate to a human when uncertain. Confidence is calibrated, not asserted.

P-03

Privilege & data minimization

Models see only the fields the task requires. Outputs carry the privilege of the inputs.

P-04

Audit trail or it didn't happen

Every inference is reproducible from inputs, weights, prompt, and policy version.

// 02 · selected work

Selected work.

CS-002 / B2B SaaS · customer support / AI audit

Found the agent quietly issuing $112K of unauthorized refunds.

An autonomous agent was resolving 38% of inbound tickets. Finance flagged a refund-rate anomaly; the support team couldn't explain it

4 weeks 2025 A Series-C SaaS with a 45-person support team and an AI agent handling tier-1 tickets.

CS-001 / Consumer lending / Eval-first build

An underwriting copilot that loan officers actually trust.

Twelve underwriters reviewing 800 applications a day. Average decision time was 14 minutes; senior staff were spending half their week on the easiest 60% of files

14 weeks 2025 A US-based mid-market consumer lender with $4B in originations.

CS-003 / Logistics · enterprise / Embedded team

Stood up an in-house ML team and walked away when they were ready.

The internal team had built two models that worked in notebooks but had never made it to production. They needed a reference for what 'ready' looks like — eval surface, deployment patterns, on-call.

9 months 2024 – 2025 A logistics platform handling 1.2M shipments/day; in-house ML team of four engineers.

// 03 · who we work with

Six sectors. One bar.

I-01

Financial Services

Where 'wrong' is a regulatory finding, not a UX bug.

I-02

Healthcare

PHI in, evidence-cited outputs, every action logged.

I-03

Legal

Privilege never leaks. Citations are real or it's a defect.

I-04

Retail & Consumer

Margin-aware automation. Hallucinations cost money in this sector — literally.

I-05

Manufacturing & Industrial

Telemetry-rich. Explainability-mandatory. Downtime costs more than the engagement.

I-06

Technology

Your engineers are sharp. We bring eval discipline they haven't built yet.

// 04 · capabilities

What we ship, end-to-end.

[01] AI Harness

End-to-end production scaffolding: orchestration, observability, evals, guardrails, and cost controls. The plumbing your in-house team will not have to build.

[02] Custom Agents

Goal-directed agents wired into your real systems — CRM, ERP, data warehouses, internal APIs. Built for measurable workflows, not chat windows.

[03] Model Fine-Tuning

Domain adaptation on your proprietary data. From SFT to DPO and RFT pipelines, with rigorous offline + online evaluation before anything ships.

[04] AI Projects

Bounded engagements: a problem, a budget, a deadline. We embed with your team or run the project end-to-end through delivery and handoff.

[05] Platform Engineering

When the team is big enough to matter: an internal AI platform with shared evals, a model gateway, prompt registry, and a path to self-serve.

[06] Advisory

For CTOs and CEOs: a senior partner across architecture, vendor selection, build-vs-buy, hiring, and roadmap. Quarterly cadence, no consulting deck theatre.

// next

If you have a real problem, we should talk in the next two weeks.

The bench is deliberately small. We decline work that lacks production access, ownership, or a serious sponsor.

Submit project intake