Services - Kryse

[01] / Most engagements

Eval-first builds

We build the system around the test suite, not the other way around.

A discovery + build engagement that ships a production AI system behind a measurement harness. The eval is the spec — written before the prompt, run on every commit, and handed to your team as the runbook for everything that comes after.

8 – 16 weeks $140K 1 ML lead + 1 platform engineer + part-time PM

Included

Two-week discovery with a signed problem-and-success spec
Eval harness — regression, adversarial, drift, cost suites
System build: agents, fine-tunes, retrieval, gateway, guardrails
Observability: tracing, cost dashboards, on-call runbook
Two-week supervised handoff with your engineering team
60-day defect-fix window after handoff

Not included

Frontend / product UI work outside the AI surface
Long-tail data labeling (we partner; we don't staff it)
On-call after the 60-day window (move to Embed)

A mid-market lender's underwriting copilot. Twelve weeks. Eval suite of 412 cases gates every deploy.

[02] / Diagnostic

AI audits

Tell us your AI system is misbehaving and we'll tell you why.

A four-week diagnostic on an AI system you've already built. We rebuild your eval surface, instrument your traces, run our adversarial battery, and deliver a written report with prioritized fixes. Useful before a board review, before a re-platform, or after an incident.

3 – 4 weeks $48K 1 ML lead + 1 evaluation engineer

Included

Reconstruction of an eval suite from your production traffic
Adversarial battery (jailbreak, prompt-injection, PII, cost)
Cost & latency analysis with model-routing recommendations
Drift report against the last 90 days of traffic
Written report (~30 pp) with prioritized fixes and effort estimates
Two read-out sessions: technical and executive

Not included

Implementation of the recommended fixes (separate engagement)
Ongoing monitoring (separate engagement)

A Series-C SaaS asked us to audit a customer-support agent that was quietly returning hallucinated refunds. Three weeks. 14 prioritized fixes; the top 3 closed the regression.

[03] / Ongoing

Embedded teams

Our senior people inside your team for as long as it's useful.

We place an ML lead and one or two engineers inside your team on a monthly retainer. Same Slack, same standups, same review process. Useful when you've built the foundation and need senior throughput, or when you're building an in-house ML team and want a reference for what good looks like.

3 – 12 months $32K / month 1 ML lead minimum; up to 3 engineers depending on scope

Included

Named senior people; not a rotating bench
Weekly written status; monthly steering review
Code review on every PR; architectural sign-off on every system change
Eval suite stewardship — we keep your gates honest
Two weeks notice on either side; no long lock-ins

Not included

Hiring and managing your in-house team (we coach, we don't manage)
24/7 on-call (we cover business hours; on-call is your team)

A logistics platform's in-house ML team of four, with our lead embedded for nine months. They now ship without us, which was the goal.

// compare

The engagement shape should match the risk.

Dimension Eval-first builds AI audits Embedded teams

Engagement length 8–16 weeks 3–4 weeks 3–12 months

Senior headcount 1 lead + 1 eng 1 lead + 1 eng 1–3 engineers

Eval harness Built fresh Reconstructed Stewarded

Production access Required Read-only Required

Code ownership You own it Report only You own it

Best for New systems Existing systems Sustained throughput

Starting at $140K $48K $32K / mo

// questions

Common questions before discovery.

Do you sign NDAs?

Yes, by default. We send a one-page NDA at the start of discovery; we'll also sign yours if it's mutual and reasonable. We do not sign engagement-specific non-competes.

Can we keep the code?

Yes. Every engagement ships under your repos and your IP. We retain the right to write about the methodology in anonymized form; we ask permission before naming you publicly.

Do you work with non-technical leadership?

Often. The discovery week is structured so that a non-technical sponsor leaves it understanding the problem, the constraints, and how we'll know if we've succeeded. We are deliberately allergic to AI theater.

What does a 'serious' engagement look like to you?

A team with a real problem, a real budget, and someone senior enough to make decisions inside the engagement. We turn down work without all three — kindly.

Do you do generative-marketing or content workflows?

We don't. We work on systems where being wrong has a cost — underwriting, procurement, support, clinical workflows, security ops. If your problem is ad copy, you should hire an agency.

Where are you based?

Distributed. Founders in San Francisco and Berlin. We work in your timezone for the duration of the engagement; we travel for the discovery week and at least once per build.

Three ways to work with us. One bar.

Eval-first builds

We build the system around the test suite, not the other way around.

Included

Not included

AI audits

Tell us your AI system is misbehaving and we'll tell you why.

Included

Not included

Embedded teams

Our senior people inside your team for as long as it's useful.

Included

Not included

The engagement shape should match the risk.

Common questions before discovery.