Entry

The Number That Looks Good and Means Nothing

February 6, 2026

Two workers. Both 100% utilized. One produces 1 unit per hour. The other produces 5.

Same utilization. Completely different value.

This should be obvious. It isn't. Utilization remains one of the most watched metrics in operations, and one of the least useful. Teams celebrate hitting 95%. Dashboards turn green at full capacity. Meanwhile, the number tells you almost nothing about whether the system is actually working.

Utilization answers one question: are they busy? It doesn't answer the question that matters: are they productive?

The problem isn't that utilization is wrong. It measures what it measures accurately. The problem is what it leaves out.

A resource at 100% utilization with low throughput is doing something. Just not something valuable. They might be stuck on a single slow task while faster work piles up. They might be traveling between jobs instead of completing them. They might be waiting for inputs, for handoffs, for dependencies, in ways that count as "utilized" but produce nothing.

I saw this pattern repeatedly when managing courier fleets. Two stores, both showing couriers at near-full utilization. One was humming: tight routes, fast turnarounds, high delivery counts. The other was grinding: couriers stuck on long inefficient trips, waiting at restaurants, covering ground without completing orders. The utilization metric couldn't tell you which was which. You had to look at deliveries per hour to see the gap.

Here's where it gets counterintuitive. Imagine two ways to complete the same work. Option A: two resources, both at 100% utilization. Option B: batch the work to one resource at 60% utilization. Same output. Lower utilization.

Which is better? By the metric everyone watches, Option A wins. Both resources are fully used. The dashboard is green. But Option B achieved the same result with less capacity consumed. That's not underperformance. That's efficiency.

High utilization can mean your system is tuned well. It can also mean your system is grinding through work in the most resource-intensive way possible. The metric doesn't distinguish between these. It just says "busy."

And busy, it turns out, is a terrible proxy for productive.

Queue theory has known this for decades. As utilization approaches 100%, lead times don't increase gradually. They explode.

The relationship isn't linear. A system running at 50% utilization has slack to absorb variability. Push it to 70%, and wait times start climbing. Push it past 75%, and you hit the "knee" of the curve. Small increases in utilization cause disproportionately large increases in delay. At 95%, the system isn't 15% busier than at 80%. It's exponentially more fragile.

This is why the math actually recommends slack. A system at 100% utilization has no capacity to absorb anything unexpected. A spike in demand, a delayed input, a single task that takes longer than planned. Any variability hits a system with no buffer, and the queue backs up. The queue backing up increases wait times. Increased wait times cascade downstream. A "fully utilized" system that can't get anything done.

This isn't theoretical. It's the lived experience of anyone who's run an operations floor during a demand spike, or tried to ship a feature when the engineering team is "fully allocated," or watched an AI agent spin through tasks while outcomes flatline. Full capacity isn't optimal. It's the point just before the system breaks.

This failure mode shows up anywhere activity gets mistaken for value.

Engineering teams learned this the hard way with velocity. Story points completed per sprint. Lines of code written. Commits per day. These metrics measure movement, not progress. A team can hit aggressive velocity targets while shipping features that don't work, creating technical debt that slows future work, or optimizing for ticket count instead of customer impact. The industry now calls these "vanity metrics" for a reason. They look good on dashboards. They hide what's actually happening.

AI agents are walking into the same trap. Most monitoring systems track task completion, tool calls, tokens processed. Activity metrics. But an agent can complete tasks all day without achieving the outcomes it was deployed for. One evaluation framework put it bluntly: traditional monitoring captures activity, not value, and that gap can hide risks. The emerging focus on "goal accuracy" over "task completion" is the same correction, applied to a new domain.

The pattern is consistent. Measure activity, and you'll get activity. Whether that activity produces value is a separate question the metric doesn't answer.

If utilization alone misleads, the fix isn't to stop measuring it. It's to normalize it against what actually matters: output.

The principle is simple. Take throughput and divide it by utilization. Now you're measuring output relative to capacity consumed.

Return to those two workers at 100% utilization. One produces 1 unit per hour, the other produces 5. Utilization says they're identical. But throughput divided by utilization gives you 1.0 for the first and 5.0 for the second. The gap that was invisible is now obvious.

It gets more useful with the counterintuitive cases. A worker at 80% utilization producing 5 units per hour scores 6.25. Higher than the "fully utilized" worker producing the same output at 100%. The metric now rewards efficiency instead of effort. It recognizes that achieving the same result with less capacity consumed is better, not worse.

This framing transfers. For engineering teams, it might be features shipped per developer-hour allocated. For AI agents, goals achieved per computational cost. The specific numerator and denominator change by domain, but the structure holds: outcome divided by resource consumed. Not activity. Value.

The number everyone should watch isn't whether resources are busy. It's what they produce for the capacity they consume.

Every organization has metrics that feel important because everyone watches them. Utilization is one. It shows up in dashboards, gets discussed in reviews, triggers alerts when it drops. The attention creates an illusion of significance.

But a metric that can look identical for a high performer and a struggling one isn't measuring performance. It's measuring presence. And optimizing for presence, across enough decisions and enough time, quietly hollows out the system it claims to monitor.

The fix isn't complicated. It's just unpopular. It means telling a team that their green dashboard might be hiding a problem. It means redefining what "fully utilized" actually indicates. It means accepting that slack isn't waste. Sometimes it's the only thing keeping the system from collapse.

The number that looks good and means nothing is still a number. It will still get reported. The question is whether you also measure the thing that matters: not how busy the system is, but what it actually produces.