Back to selected work
Apps·CASE STUDY

MLOps monitoring,
made legible.

A real-time monitoring surface for ML pipelines — drift detection, deployment health, and incident replay across model lifecycles.

  • HTML · CSS
  • Next.js
  • Tailwind
Book a free 30-min call All projects
Sector
AI · MLOps
Users
ML engineers, SREs
Scope
Product UX · data UX · system tokens
Surface
Web app

01 · Overview

Overview.

Designed a unified monitoring surface for an ML platform team running 40+ production models. The role: principal product designer working with platform engineering and SRE leadership.

02 · Challenge

The challenge.

Ops engineers were debugging in three places — Datadog for infra, MLFlow for experiments, a custom dashboard for drift — and stitching context manually after every page. P1 incidents averaged 38 minutes from alert to root cause.

03 · Process

Process.

Started with shadowing on-call rotations to map the actual debugging path. Ran 12 user interviews. Built information-architecture sketches against three real incident retrospectives, then prototyped at three fidelities — flow, mid-fi, hi-fi tokens.

  • 12 stakeholder + on-call interviews
  • Three retrospective walk-throughs replayed in prototypes
  • Token-first design system aligned with platform engineering

04 · Solution

Solution.

A single canvas with three switching contexts: model health, deployment, drift. Status grammar borrowed from incident response (P1/P2/P3) so SRE and ML engineers shared a vocabulary. Replay scrubs through any incident in under 10 seconds.

  • Unified canvas — one screen, three contexts
  • Drift detector with rule-based + statistical thresholds
  • Incident replay — scrub the timeline 60min around an alert
  • Tokens shipped to engineering as a Tailwind preset

05 · Results & metrics

Results.

  • 38 → 11 minmean time to root cause
  • 72%fewer false-positive alerts
  • 100%on-call adoption in week one

Have a project
in this shape?

Discovery calls are free, last 30 minutes, and end with a clear plan — whether or not we work together.

Book a free 30-min call See pricing