At Salesken, I watched an engineer game our story point system for an entire quarter. His 'velocity' was double anyone else's. His actual output? About average.
Story points in agile development are abstract units used to estimate the relative effort, complexity, or risk of a user story — but they're fundamentally flawed because they collapse multiple dimensions of work into a single number. Common scales include Fibonacci (1, 2, 3, 5, 8, 13) and T-shirt sizing (S, M, L, XL). In practice, story points fail because different teams define them inconsistently, they don't correlate with actual delivery time, and they're easily gamed. Leading engineering teams are replacing story points with data-driven metrics like historical cycle time and codebase complexity signals.
I spent five years watching the same estimation meeting happen at four different companies. Product would ask "how long will this take?" Engineering would answer "it's a 5-point story." Product would nod as if points meant something concrete. Then the work would either finish in two days or stretch into two weeks, and everyone would act surprised.
The reason it's surprising: story points don't measure duration - they measure some phantom quantity that teams fight about because nobody's defined it consistently. Are they effort? Complexity? Risk? Uncertainty? Different teams use them differently. The same 5-point story means something totally different when you're working in a well-understood module versus touching legacy code you've never seen.
I built Glue partly because I got tired of this meeting. But before I talk about what changed for us, I want to be direct about what story points actually do wrong.
The Fundamental Problem
Story points collapse multiple dimensions of work into a single number, then pretend that number is predictive. Here's what actually happens:
A 5-point story in the authentication module - a system your team has touched fifty times - takes your senior engineer two hours. The same 5-point story in the checkout flow, which you rewrote six months ago and nobody's looked at since, takes four days. The story didn't change. The context did.
Points don't capture codebase familiarity. They don't capture knowledge distribution. They don't capture technical risk or architectural constraints. They collapse all of that into a number that teams then fight about during sprint planning, where someone inevitably says "that's not a 5, that's an 8" and you're back to theater.
Here's what I've watched happen: Teams optimize for appearing consistent with their points. They push back on scope to hit the estimate. They pad estimates because estimates are always wrong. They argue about whether something is a 3 or a 5 instead of asking the real question: what about this work is actually hard?
The thing about story points is that they're meant to abstract away from hours. We don't want to say "this will take 40 hours" because scope always changes. But what we've done instead is create an abstraction so divorced from reality that it's less useful than just saying "I don't know."
What Actually Matters
If I'm running a team, I care about four things:
- How long do similar types of work actually take?
- What's our constraint - is it capacity, knowledge, or architecture?
- How does that work distribute across the codebase?
- When we get it wrong, why?
None of those things are captured by story points.
What matters is cycle time - not the estimate, the actual duration. And not as an average. As a distribution. A feature that takes two days most of the time but two weeks when you hit unfamiliar code needs a different approach than a feature that takes four days consistently. The first needs better knowledge distribution. The second is probably working as designed.
What matters is understanding which types of work slow us down. Database migration work consistently takes longer than form work. Feature work in the payment system is slower than feature work in the dashboard. Some of that is legitimate - the payment system has higher stakes. Some of it is technical debt. Some of it is "we only have one person who understands this."
What matters is seeing where the knowledge is concentrated. If 70% of the critical path work touches modules that only your most senior engineer knows, you have a resilience problem that story points don't measure. And you can't fix it if you don't measure it.
What matters is being able to look at the actual code changes and ask "why did this take longer than expected?" Was it unforeseen complexity? Unexpected dependencies? A module that was harder to understand than we thought? Each answer points to a different solution. Points don't help you get there.
The Real Insight
When I was building engineering teams, I realized that the teams I trusted most weren't the ones with the most consistent estimates - they were the ones that could explain why work varied. They knew their codebase. They knew where the traps were. They could give you a 30% range instead of a single number, and the range was actually useful because they understood what moved it.
That's the skill I wanted to build Glue around. Not to replace estimation - estimation will always be imperfect - but to make estimation a conversation based on visibility instead of theater.
If you can see the codebase, you can answer better questions. Not "is this a 5 or an 8?" but "who built this module and can we get them on the review?" Not "how many points?" but "how has this part of the system changed in the last six months and what does that tell us?" Not abstract complexity, but concrete architectural reality.
Teams that have codebase visibility estimate better. Not because they're better at guessing, but because they're estimating based on evidence instead of intuition.
The Path Forward
I'm not saying story points are evil. I'm saying they're a broken abstraction. They promise consistency but deliver theater. They measure effort in isolation while ignoring systemic constraints. A team with good codebase visibility doesn't need them. A team without visibility shouldn't trust them.
What I'd recommend instead: Stop asking "how many points is this?" Start asking "what type of work is this?" and "how long have similar types of work actually taken?" Track the answers. Over time, you'll have real data about what's hard and why. You'll find the bottlenecks. You'll distribute knowledge before it becomes a crisis.
This is hard to do without visibility into the codebase - what changed, where, how often, who touched what. That's the gap we built Glue to close. Not as a replacement for estimation, but as the foundation that makes estimation something other than a guess.
The teams I respect most don't fight about points. They fight about understanding. They ask better questions. They have a shared view of what's in the system and why the system behaves the way it does. Everything else - including estimation - flows from that visibility.
Frequently Asked Questions
Q: What are story points in agile development?
A: Story points are a relative sizing unit used in agile development to estimate the effort, complexity, and uncertainty of a user story or task. Teams typically use the Fibonacci sequence (1, 2, 3, 5, 8, 13, 21) or T-shirt sizes (S, M, L, XL) as their scale. In theory, story points normalize for individual speed differences — a senior engineer and a junior engineer should assign the same points to the same task. In practice, story points often fail because they conflate complexity with unfamiliarity, creating estimates that measure how well someone knows the codebase rather than how hard the work actually is. Better alternatives include cycle time tracking by work type and bucket estimation (small/medium/large/needs-breakdown).
Q: What is the best story point calculator?
A: There is no reliable standalone story point calculator because story points are relative to your team's specific context, codebase, and velocity. Online calculators that claim to convert requirements into points are estimating in a vacuum. The most effective approach is reference-class estimation: pick 3-5 completed stories your team agrees on (a "2," a "5," an "8"), then compare new work against those references. For data-driven estimation, tools like Glue and LinearB analyze your actual cycle time history to show how long similar work has taken — which is more accurate than any calculator.
Q: Does this mean we should go back to time-based estimates? A: No. Time-based estimates have the same problem - they appear precise while being inaccurate. The difference is that you can look back at actual cycle time data and learn from it. The insight isn't "estimate in hours instead of points." It's "use real data about how long similar work actually took" instead of abstract numbers. Track cycle time by work type, not by individual estimate. That's actionable.
Q: But our team already uses story points and it's working fine. A: Most teams think their estimates are working until they look at the data. Ask yourself: Do your estimates cluster around certain numbers (3, 5, 8, 13) more than they should? Do they vary more than your team expects? Do you routinely miss sprints? If any of that's true, your points aren't actually working - you're just used to the theater. The real test: Can you explain why this story took twice as long as estimated? If the answer is "we were wrong about complexity," your estimation system isn't capturing the thing that actually matters.
Q: How do I sell this to leadership? A: Stop talking about story points entirely. Talk about throughput and predictability. Ask your team to track actual cycle time by work type for one sprint. Look at the variance. Show it to leadership. That's more useful than any estimation meeting.
Related Reading
- Sprint Velocity: The Misunderstood Metric
- Cycle Time: Definition, Formula, and Why It Matters
- DORA Metrics: The Complete Guide for Engineering Leaders
- Programmer Productivity: Why Measuring Output Is the Wrong Question
- Software Productivity: What It Really Means and How to Measure It
- Automated Sprint Planning: How AI Agents Build Better Sprints
- Velocity Doesn't Tell You How Far You Need to Go
- Software Estimation Accuracy
- What Is Agile Estimation?
- What Is Effort Estimation?
- What Is Software Project Estimation?
- What Is Estimation Best Practices?
- The Complete Guide to Software Estimation