The engineering feedback loop most teams are missing is the long feedback loop from production outcomes back to codebase decisions — connecting deployment data, incident patterns, and performance metrics to the specific code changes, architectural patterns, and design decisions that caused them. While product teams have well-established feedback loops (ship → measure → learn → iterate), engineering teams typically only have short feedback loops within sprints (code review, testing, deployment) and lack the systematic connection between production signals and codebase-level decisions.
At Salesken, our feedback loop from production to planning was broken. Incidents happened, we fixed them, but the lessons rarely made it back to sprint planning.
Product teams have feedback loops. They ship a feature, watch how users interact with it, learn what works, and iterate. User feedback flows back to product decisions.
Engineering teams don't. They have short feedback loops within sprints (code review, testing, deployment), but they're missing the long feedback loop from production back to codebase decisions.
Most teams can't answer: what parts of the codebase are actually used? Which architectural decisions are working? Where is technical debt actually causing production problems? Where is overengineering wasting cycles? That information lives in production signals (latency, error rates, incident patterns, user behavior) but it doesn't flow back to the codebase level.
The teams that move fastest and build the most reliable systems have connected this loop.
The Missing Feedback Loop
Here's what exists: Sprints happen. Code is written. Tests run. Code is reviewed. Deployment happens. For the next two weeks (or however long the sprint is), engineers iterate on code they just wrote. Then the feedback loop stops.
Production runs. Users interact with the system. Incidents happen or don't. Features are used or aren't. Performance is good or slow. But none of this flows back to codebase decisions.
Six months later, the team realizes: "We overengineered this authentication layer, it's 20% of our infrastructure costs and we could have done it in a fraction of the code." Or: "This payment module has 10 incidents per month because we patched it wrong three years ago." Or: "This really fast code path uses a module that's 10x too complex for what it actually does."
The information was always there (incidents, latency metrics, cost data), but it wasn't connected to the code level. So engineering decisions were made without that context.
The Complete Loop
The teams that move fast have connected the loop. It looks like this:
1. Production signals tell you what's breaking. Incidents, error rates, latency spikes, failed deployments. These are not abstract metrics - they're specific: "The payment service is timing out 2% of the time" or "The recommendation engine crashes every time we load the new user cohort."
2. Those signals connect to codebase locations. This payment timeout is happening where? In the payment service's transaction processor. In the retry logic. In the database query that's missing an index. Be specific.
3. From codebase locations, you understand architectural decisions. Why is the transaction processor designed this way? What was the tradeoff? Was it right at the time but wrong now? This is where post-mortems should lead - not just to "let's fix the bug" but to "do we need to rethink this architectural pattern?"
4. Architectural understanding informs future decisions. The next time the team is deciding whether to use a synchronous or asynchronous API, they have data: "The last time we chose synchronous, it led to timeouts and incidents. Let's be asynchronous here."
5. Those decisions are reflected in code. The architecture is chosen. Code is written to match. Tests are written to enforce the pattern. Onboarding docs explain the pattern.
6. The loop closes. Six months later, the codebase reflects what actually works in production.
That's the complete loop. It happens when engineering signals flow back to code.
What Data Should Flow Back
Incident patterns. Which modules appear in post-mortems? If the payment module appears in post-mortems 10 times but the recommendation module appears once, the payment module has an architecture problem. Engineering should know this and use it to decide when to refactor.
Resource usage and cost. Which modules consume the most CPU? Memory? Database queries? Cloud costs? If a module that serves 10% of your traffic uses 40% of your resources, that's a signal. Either the architecture is wrong or it's the right architecture for an expensive problem.
Feature adoption and usage. Which code paths do users actually exercise? If a module is designed to handle cases that never happen, it's overengineered. If a module is designed for 10 req/s but actually gets 1000 req/s, it's underengineered and fragile.
Dependency relationships. What breaks when you change something? If changing the authentication module breaks 15 other modules, that's a feedback signal: the coupling is too tight, refactor it. If changing a module breaks nothing, maybe it's dead code.
Bug patterns. Which modules have the most bugs? Not as an audit, but as information. If one module has 20 bugs reported against it and another has 2, the code architecture in the first module is probably wrong. Complex code attracts bugs.
All of this information exists in your systems right now. Incidents are tracked. Metrics are measured. Code changes are logged. The missing piece is connecting them and feeding them back to engineering decisions.
Why This Matters
Connected feedback loops prevent recurring problems. If an incident happens twice, the second time you know it's not a fluke - you need to change the architecture. If a module keeps appearing in post-mortems, you refactor it instead of patching it again.
Feedback loops also improve velocity. Instead of repeating mistakes, teams learn from them. The second time you encounter a similar architectural decision, you have data from the first time. You make better decisions faster.
And they make onboarding faster. A new engineer doesn't just learn what the code does - they learn why the code is structured that way because the feedback loop is visible. "This service is asynchronous because the synchronous version had timeouts in production" is much more useful than "it's just asynchronous, that's how we do things here."
How to Start
Start small. Pick one architectural pattern you're uncertain about. Maybe it's your approach to error handling, or your choice of synchronous vs. asynchronous communication, or your approach to caching.
Define what signals would tell you the pattern is working or not. For synchronous vs. asynchronous: timeout rates, latency percentiles, incident frequency. For error handling: unhandled exception rates, error propagation patterns. For caching: cache hit rates, staleness incidents.
Measure the signals for your current implementation. Then, if you change the pattern, measure again. See if the signals improve.
This is how you build the feedback loop. It doesn't require new tools - just connecting data you already collect to decisions you're about to make.
Frequently Asked Questions
Q: How do we know if a production signal is actually caused by a code decision vs. an operational issue?
A: Good question. Some incidents are code bugs that need fixing. Some are operational - you need more resources, or the traffic pattern changed. Start by assuming it's code - if the signal repeats across different operational contexts, it's probably the code. Tracking change failure rate and deployment frequency helps distinguish code-caused incidents from operational ones. This is why post-mortems matter: they help you distinguish code problems from operational problems.
Q: This sounds like a lot of analysis overhead. Don't engineers need to move fast and not overthink?
A: They do need to move fast. The point of the feedback loop is the opposite of overthinking - it's making decisions based on data instead of guessing. "Did the async refactor actually fix the timeouts?" can be answered in minutes by checking latency metrics. That's faster than guessing.
Q: What if our production signals are unclear or noisy?
A: That's real and worth fixing. Noisy alerts and unclear signals waste time. But the solution isn't to ignore the signals - it's to improve them. Better observability, clearer alerting, and more focused engineering efficiency metrics feed the loop better. Connecting production signals to codebase intelligence makes them actionable. Invest in signal clarity and the feedback loop becomes valuable.
Related Reading
- Programmer Productivity: Why Measuring Output Is the Wrong Question
- Developer Productivity: Stop Measuring Output, Start Measuring Impact
- DORA Metrics: The Complete Guide for Engineering Leaders
- Engineering Efficiency Metrics: The 12 Numbers That Actually Matter
- What Is a Technical Lead? More Than Just the Best Coder
- Software Productivity: What It Really Means and How to Measure It