< BACK TO PORTFOLIO

CASE STUDY

They weren't lacking data. They were lacking clarity

How I helped engineers find answers — not data — during live outages

"Every minute costs money."

Imagine:

United.com crashes on Thanksgiving. Engineers open Splunk.

📊 50+

Dashboards to check

🗂️ 10+

Tabs open at once

💸 $108K

Lost per minute

When I inherited this dashboard, I saw the same panic every time.

Engineers weren't slow because they were bad at their jobs. They were slow because the system made them hunt for answers.

"They weren't missing data. They were drowning in it."

BEFORE: The Scavenger Hunt

"Check Tab 1... not here. Tab 2... maybe? Tab 3...     wait, go back to Tab 1 and cross-reference..."

⏱️ Average time to start fixing: 12 minutes

  • 15 metrics. — engineers had to already know what to look for

  • 8 panels — nothing said "start here"

  • And this was just one tab.

The old RUM was a library — everything was there, but nothing said "start here."

I connected the dots for them — correlated signals in one view so engineers can see, judge, and act without memorizing 10 tabs or holding context in their head.

"3 months. No clean slate. Ship anyway."

I led design on this project, partnering with a principal designer from another team — we shared a layout system across products. No time for large-scale research. So I mined 2 years of support tickets and talked to engineers who'd lived through real outages.

"Engineers don't need more data. They need less noise."

The pattern was clear: they weren't slow because they lacked information. They were slow because the system didn't tell them where to start.

The question wasn't "how do we show more?" It was "how do we show less — but the right things first?"

"From library to control tower."

AFTER: The Control Tower

⏱️ Time to first action: 3 minutes

  • Health check at the top — Page views and Duration tell you immediately if something's wrong

  • URLs ranked by impact — see which pages are affected

  • Browser/OS breakdown grouped together — no more tab switching

I cut the default view down to what actually drives decisions under pressure. The top of the screen answers: is something wrong? The middle answers: where? The bottom lets you drill in if you need to.

Clarify first. Expand later.

"I changed the starting question."

The Dropdown Reorganization Diagram

The dashboard wasn't the only problem. Even finding the right metric was a guessing game.

BEFORE Left: "Which of these 15 is right?" / AFTER Right: "What's your role? Start there."

Before, the dropdown was a flat list of 15 technical metrics. Engineers had to already know the answer to pick the right one.

I reorganized it by role: UX, Frontend, Backend, Network. Now engineers start from what they know — their job — and narrow from there.

"30% faster. $1.4M saved."

30%

faster diagnosis time

$1.4M

estimated annual savings from shorter outages

"Engineers stopped scrambling.
They trusted the first screen."

Engineers found problems 30% faster. For companies losing $100K+ per minute, that translates to ~$1.4M in annual savings from shorter outages.

But the real win wasn't the number. It was the confidence. Engineers stopped opening five tabs and guessing. They opened one screen and acted.

"More data wasn't the answer. Clarity was."

"Engineers are rarely slow because they lack data. They're slow because systems fail to surface the right signal first."

I didn't add features. I removed everything that didn't help someone act in the first 30 seconds.

"I redesigned Splunk RUM so that when a website is on fire, engineers see answers — not data."

Let's talk

I'm currently open to new opportutnies