I've shipped hundreds of estimates across three companies. My accuracy improved dramatically when I stopped relying on gut feel and started using historical data from our actual codebase.
By Vaibhav Verma
Every sprint, the same conversation happens. The product manager says "can we ship this in two weeks?" The engineer says "probably three weeks." Three days before the deadline, the engineer says "actually, we're at 60% and I didn't know about this complexity." The feature slips. The team is frustrated. The PM is frustrated. Everyone points fingers.
This cycle repeats because estimation is broken in most organizations. Not because teams don't try ( - they do). But because they're trying to estimate in the absence of information. The engineer doesn't know the full scope. The PM hasn't validated assumptions about what "done" means. Nobody has looked at the code that will be touched and how tangled it is. You're throwing darts in the dark and calling it planning.
Estimation is solvable. But it requires understanding four things: why estimation is hard (the sources of error), which methods work for which situations, how to involve the right people, and what actually improves accuracy over time.
This guide is the comprehensive guide to software estimation. Not a formula that always works ( - no such formula exists). But a framework that will improve your accuracy, build trust between PMs and engineers, and make your planning predictable instead of chaotic.
Why Estimation Is Hard (The Four Sources of Error)
Estimation misses don't happen by accident. They happen because of specific, identifiable sources of error. Understanding these sources means you can fix the right thing.
Source 1: Complexity Uncertainty
You don't know what you don't know. The engineer estimates "two weeks" based on the happy path. But there's a race condition in the queue that will take a week to debug. There's a data migration that's more complex than expected. There's a edge case where the system needs to handle 10,000 concurrent requests instead of 100.
This is fundamental uncertainty. You can't know everything about the codebase before you build something. Some complexity is hidden.
How to mitigate: Before estimating, identify the code modules that will be touched. Have a senior engineer quickly assess those modules for hidden complexity. Are there known race conditions? Is the code simple or tangled? Is there technical debt in those areas? This "complexity scan" adds 10 minutes to estimation but surfaces hidden risks.
Source 2: Scope Uncertainty
You don't agree with the PM on what "done" means. The PM thinks "ship the feature." The engineer thinks "ship the feature, test it, document it, handle edge cases, monitor it in production." One person's scope is much bigger than the other's.
This is not about detail. It's about agreement on what the engineer is actually committing to.
How to mitigate: During estimation, explicitly agree on scope. "Done means: feature works for happy path, tests written for edge cases, documentation updated, monitoring alerts set up, and rollback plan documented." Be explicit. It takes 5 minutes and prevents weeks of misalignment.
Source 3: Estimator Optimism Bias
Engineers are optimists. When they estimate, they imagine the thing going smoothly. They don't picture themselves stuck on a bug for three hours. They don't picture waiting for design review. They don't picture the refactor that needs to happen first.
This is called planning fallacy. It's universal. It doesn't mean engineers are bad at estimation; it means humans are bad at estimation when they're trying to estimate in the abstract.
How to mitigate: Use probabilistic estimation (three-point estimation, Monte Carlo) or involve multiple estimators so optimism from one person is balanced by caution from another. Also, reference your historical data. If your team's actual delivery time is consistently longer than estimates, adjust estimates upward.
Source 4: Unknown Unknowns About the Codebase
You don't know what changed in the code since the last time you looked at it. The service you thought was stable is now using a new database. The batch process that ran nightly is now event-driven. The API you thought was deprecated is still in use and changing it would break six things.
Unknown unknowns are uniquely hard because they're invisible until you hit them.
How to mitigate: The day before estimation, have someone who understands the codebase do a 15-minute "walk recent changes" session. "Here's what changed in the modules you'll be touching in the last three months." That context prevents most unknown unknowns.
A Tour of the Major Estimation Methods (When Each Works)
Story Points (and the genuine value vs. genuine problems)
Story points are a relative sizing method. You pick a baseline story ("user can log in" = 3 points). Then you estimate every other story relative to that baseline. A story that's twice as complex as login is 6 points.
Why story points work:
- Relative sizing is easier than absolute. Humans are better at saying "this is twice as hard as that" than at saying "this will take 5.3 days"
- Story points are stable across team changes. Points are about complexity, not velocity. If your team grows, velocity grows but the points for a story don't change
- Story points create velocity metrics you can track and plan from
Why story points are problematic:
- They get converted to hours. A PM looks at "8 points" and somehow thinks "4 days" because they heard "8 points = 1 day." Points are not time. This conversion destroys the method
- Teams compare velocity across teams. Team A has velocity 50; Team B has velocity 40. Management decides Team B is less productive. But Team A's points might be inflated relative to Team B's. Velocity is not comparable across teams
- People treat points as commitments. "You estimated 8 points and shipped it in 5 days, so you're bad at estimation." No. You estimated complexity; you shipped faster because you were lucky or skilled. Points aren't predictions; they're relative measurements
- The entire point of story points is destroyed if you track "hours per point." Now you've just created complex time estimation
When to use story points: Use them when you want to track relative complexity and trend your team's delivery speed over time. Don't use them if you need time predictions. Don't compare them across teams. Don't convert them to hours.
Three-Point Estimation
Three-point estimation is simple: for each task, estimate three scenarios.
Optimistic: "If everything goes right, this takes 3 days" Likely: "If things are normal, this takes 5 days" Pessimistic: "If things go wrong, this takes 10 days"
Then calculate: (Optimistic + 4*Likely + Pessimistic) / 6
This weights the likely case heavily but factors in both optimistic and pessimistic scenarios.
Why it works:
- It acknowledges uncertainty instead of pretending you know the exact timeline
- It separates "best case" from "realistic case," which corrects for optimism bias
- The math is simple enough to use in planning meetings
Why it doesn't always work:
- People still use the optimistic number and get surprised when reality is pessimistic
- If your "pessimistic" estimate is actually based on wishful thinking (not real risks), the method doesn't help
- It only works if you honestly assess the pessimistic case
When to use it: Use it when estimating work that has meaningful uncertainty. "Implement payment processing" needs three-point estimation. "Add a new text field to a form" doesn't.
Reference Class Forecasting
Instead of estimating bottom-up (breaking the work into pieces and estimating each), you estimate based on historical similar work. "The last time we did X, it took Y weeks. This looks similar, so it should take about Y weeks."
Why it works:
- Historical data is more reliable than intuition
- It factors in all the unknown unknowns because they happened to your team before
- It's faster than detailed estimation
Why it doesn't work:
- You need historical data to reference
- "Similar" is hard to define. Is this actually similar to the last one?
- You might be repeating historical mistakes
When to use it: Use it once you've shipped enough features that you have a library of "similar work" to reference. For a new team, collect historical data for six months, then use reference class forecasting.
Monte Carlo Simulation
Monte Carlo is for high-uncertainty work: break the task into subtasks, estimate each one probabilistically (three-point estimates or ranges), then run a simulation that repeatedly samples from those ranges. The simulation tells you: 50% likely to finish in 4 weeks, 75% likely in 5 weeks, 95% likely in 6 weeks.
Why it works:
- It acknowledges that many subtasks have uncertainty
- The output is probabilistic, not a false point estimate
- It's useful for planning dependencies
Why it's overkill:
- Most features don't need this level of analysis
- It requires real discipline to estimate the subtasks honestly
- It looks sophisticated but often just reveals what you already knew
When to use it: Use it when you're planning large features that depend on multiple unknowns and you need to understand the probability of hitting specific dates. Use it for roadmap planning. Don't use it for "can we ship this in two weeks?"
NoEstimates / Throughput-Based Approaches
Instead of estimating features, you estimate team throughput: "We ship an average of 4 features per sprint" ( - doesn't matter if they're complex or simple). Then you just count: 10 features in the backlog, 4 shipped per sprint, that's 2.5 sprints.
Why it works:
- It avoids the entire problem of estimation accuracy
- It's simple
- It works if your features are roughly similar in complexity
Why it doesn't work:
- It assumes all features are the same complexity, which they're not
- You can't plan for high-complexity work
- If your features vary wildly in complexity, throughput-based planning is wildly inaccurate
When to use it: Use it if you have a true feature factory: small, similar-complexity features flowing through like an assembly line. Most product teams don't have that. If you have a mix of small features and complex ones, use a hybrid approach.
The Psychology of Estimation (Why Planning Fallacy Beats Optimism)
Humans are fundamentally optimistic about timelines. This isn't a flaw; it's how our brains work. We imagine the thing going smoothly. We don't imagine friction.
This is called the planning fallacy. And it's measurable. Most teams' actual delivery is 30-50% longer than their estimates. That's not team incompetence; that's planning fallacy.
Here's what makes it worse: breaking work down into smaller pieces makes people feel more confident, but it doesn't actually fix the bias. You estimate each piece optimistically, roll them up to a total, and the total is still optimistic. Hofstadter's Law applies: "Everything takes longer than you expect, even when you expect it to take longer than you expect."
So what fixes it?
Diverse estimators: Have multiple people estimate independently. If one person says 2 weeks and another says 4 weeks, the disagreement flags uncertainty. The truth is probably somewhere in between.
Historical data: Track your estimates vs. actual delivery. "Our estimates are off by 40% on average" is useful data. You can build that into future estimates.
Explicit assumption logging: During estimation, write down your assumptions. "We assume the API is already built. We assume the data model change doesn't require migration. We assume nobody is blocking us." Now when one assumption breaks, you know why the estimate is wrong.
Range estimates: Don't estimate "5 days." Estimate "4-6 days" or even better: "50% likely 4 days, 75% likely 5 days, 90% likely 7 days." Ranges acknowledge uncertainty instead of pretending you know.
What Actually Improves Estimation Accuracy
You want your estimates to be predictive. Not perfect, but directional. You ship something you estimated at 3 weeks, it actually takes 3.5 weeks. That's good estimation.
Here's what moves the needle:
Historical data from your own team is worth 10x more than benchmarks from other teams. "Average feature takes 2 weeks" means nothing if you're working in a different codebase, with different team size, different tools. Your team's average based on actual delivered work is gold.
Identifying the modules involved before estimating is the single biggest accuracy improvement you can make. "Build a dashboard" estimated cold is guesswork. "Build a dashboard using the analytics module, the cache layer, and the user service" is informed estimation because you've mapped the codebase dependencies.
Making scope explicit before estimating prevents the most common source of misses: scope creep. The engineer estimated "show data" and the PM expected "show data, export to CSV, and create alerts." Agree on scope upfront.
Tracking estimate accuracy and updating your mental models is how you get better. Every sprint, compare estimates to actuals. Adjust your calibration. "We estimated high here and shipped faster; we estimated low there and shipped slower." That feedback loop improves accuracy over time.
Involving the person who will do the work in estimation sounds obvious but many teams don't do it. The junior engineer estimates; the senior engineer codes. No surprise when the estimate is wrong. Get the implementer in the room.
A Practical Estimation Process That Actually Works
Here's a process you can implement next sprint:
Day 1 (2 hours, with product and tech leads) Start with your backlog. For each feature, ask: "What's this feature, really?" Not the marketing description, the actual work. "Show user recommendations on profile page" becomes "Query recommendation service API, handle caching because API is slow, show loading state, handle errors, test edge cases with different user types."
Once you have clear scope, write it down. This is your commit: "This feature includes these things and doesn't include those things."
Day 2 (30 minutes, tech lead alone) For each feature, map: "What modules do we touch? Authentication, user service, database." Get a senior engineer to walk the code in those modules for 10 minutes. "Here's what changed recently in the user service, here's the known complexity, here are the edge cases." Write down: green (clean code), yellow (some complexity), red (high risk).
Day 3 (2 hours, with the whole team) Run planning poker. For each feature:
- Read the scope
- Show the code health assessment
- Each engineer independently estimates
- If estimates agree, you're done
- If they disagree, the optimist and the pessimist each make a case for 2 minutes, everyone re-estimates
- After re-estimation, if there's still a wide spread, ask: "What are you worried about?" That question usually surfaces hidden assumptions
Write down the estimate and the key assumptions.
Day 4 Commit to the sprint. Don't try to stretch to fit one more feature. Include a buffer for unknowns.
End of sprint Compare estimates to actuals. "We estimated 3 weeks, shipped in 2.5 weeks." Update your data. Over time, you'll see patterns: features that are consistently longer, features that are consistently spot-on.
The Organizational Dynamics: Who Estimates, How to Manage Pressure
You've got a framework. But organizations break frameworks through pressure.
Who should estimate? The people who will do the work. Not the tech lead estimating for their team. The engineer who'll build it. Multiple engineers if it's complex. Get diverse perspectives.
Who should push back on estimates? The PM should push back on estimates they think are too conservative. But this is a conversation, not a demand. "I thought this would be two weeks. You estimated four. What am I missing?" If the engineer explains real complexity, you update your expectation. If the engineer is being overcautious ("I don't want to commit to anything because something might go wrong"), that's a different conversation.
How to handle pressure to reduce estimates? This is the hard one. The CEO says "we need this in three weeks, not five." You've got options:
Option A: Ship a lower-scope version in three weeks. Scope down, not timeline. Option B: Ship on time with increased risk. Be explicit: "We can ship in three weeks if we accept 15% failure rate and skip testing on edge cases." Option C: Add people. Temporary increase in velocity, but usually slower than expected due to ramp-up. Option D: Push back with data. "Our historical data shows features like this take 5 weeks. Here's why."
Option D is only credible if you have the data. That's why tracking estimates vs. actuals matters.
How to manage the incentive problem? If estimates are used to evaluate performance ("you estimated 3 weeks and took 4 weeks, you're underperforming"), engineers will estimate defensively. They'll over-estimate to look good. The estimate becomes a lie.
Better: estimates are a prediction tool, not a performance measure. If the engineer estimated 3 weeks, took 4 weeks, and hit the deadline and shipped quality code, that's fine. The data point is useful ( - this type of work takes longer than we thought), and the engineer's performance is fine.
Common Pitfalls (What Breaks Estimation)
Converting story points to hours or days: "8 points = 4 days because velocity is 2 points per day." This destroys story points. Points are relative complexity, not time. Stop doing this.
Comparing velocity across teams: "Team A has velocity 50, Team B has 40, we should merge them." Velocity is contextual to the team and the features they work on. Don't compare.
Using estimates as commitments: "You estimated 5 days, you must deliver in 5 days." This creates bad incentives and destroys estimation accuracy.
Not tracking estimate accuracy: You don't learn. Track it. Adjust. Get better.
Assuming the codebase is unchanging: The code you're touching might have been refactored, redesigned, or rewritten since you last looked. Codebase context matters.
Estimating without knowing scope: "How long to build search?" is unanswerable without knowing: does the search query a separate service or the local database? What are the performance requirements? How should it handle misspellings? Scope first, estimate second.
Not separating "estimate" from "committed deadline": An estimate is "here's my best prediction based on current knowledge." A committed deadline is "here's when we'll ship regardless." These are different. Be clear which one you're talking about.
How Glue Helps
Glue provides the codebase context that makes estimation more accurate. You don't have to guess what modules a feature touches. You ask Glue, and it tells you. You don't have to hope you know about recent changes. You ask Glue what changed in the auth service last sprint, and it shows you.
More importantly, Glue tracks what changed when you ship. So after a quarter, you can ask Glue: "Which features took longer than estimated and why?" and get the actual code data. Not guesses. Not "the code was complex." But "the features we estimated at 3 weeks that actually took 5 weeks all touched the user service, which had high complexity and tight coupling." That's the feedback loop that improves accuracy.
Glue turns estimation from guesswork into informed prediction.
Frequently Asked Questions
Q: Our estimates are always wrong. How do I fix this?
Start with data. Track your estimates vs. actual delivery for four weeks. You'll see patterns. "Features that touch the payments module always take 30% longer." "Features that require database changes always surprise us." Fix the patterns, not the individual estimates.
Q: How do I deal with a team that over-estimates defensively?
Stop using estimates for performance evaluation. Make it safe to estimate accurately. "Our goal is accurate predictions, not perfect delivery. If you estimate 4 weeks and it takes 4.5 weeks, that's great estimation. We learned something about this type of work." Once you separate estimates from performance, the sandbagging goes away.
Q: Should we estimate spikes and research work differently?
Absolutely. Use a different unit. "2 days of spike work" not "5 story points." Spikes have uncertainty. Timebox them ( - "spend 1 day investigating, then report findings") rather than estimate them. Don't treat uncertainty the same as complexity.
Q: How much time should estimation take?
Plan to spend 3-5% of sprint time on estimation. If estimation takes more time than that, you're over-analyzing. If it takes less, you're probably not thinking hard enough about scope and complexity.
Q: When should I re-estimate in the middle of a sprint?
Only if you discover scope or complexity you didn't know about at the start of the sprint. Don't re-estimate because you're running behind; that's different. You're behind because of planning fallacy or optimism bias, not because the estimate was wrong.
Q: How far out should we estimate?
Plan next sprint in detail. Plan the following sprint at a high level. Beyond that, use roadmap-level estimates that are ranges, not points. "Q2 roadmap: 80-120 story points" is reasonable. "Week of April 15: 47 points" is not.
Related Reading
- Sprint Velocity: The Misunderstood Metric
- Cycle Time: Definition, Formula, and Why It Matters
- DORA Metrics: The Complete Guide for Engineering Leaders
- Programmer Productivity: Why Measuring Output Is the Wrong Question
- Software Productivity: What It Really Means and How to Measure It
- Automated Sprint Planning: How AI Agents Build Better Sprints