Glossary
Code quality metrics quantify software maintainability and reliability through complexity, test coverage, and defect density. Learn how to measure what matters for product delivery.
At Salesken, I learned the hard way that measuring the wrong thing is worse than measuring nothing — it gives you false confidence while the real problems compound.
Code quality metrics are quantitative measures of how well-structured, maintainable, and reliable software code is. These metrics fall into four categories: complexity (cyclomatic complexity, cognitive complexity), maintainability (code duplication, module size, comment density), reliability (test coverage, defect density, bug escape rate), and security (known vulnerabilities, CVSS severity distribution). The critical insight is that most teams optimize for the wrong signals - typically chasing test coverage percentages rather than evaluating whether coverage targets the riskiest code paths.
Product teams often view code quality metrics as purely engineering concerns, disconnected from delivery speed or user impact. This misses the core relationship: code quality trends predict velocity trends. A team with 95% test coverage but degrading deployment frequency likely has coverage concentrated in low-risk areas while high-complexity modules remain fragile. The metrics are inverted in value.
Engineering managers need metrics that answer the business question: "Are we trading velocity for quality, or can we improve both?" This requires understanding correlations - does your cyclomatic complexity trend upward while defect density stays flat? That's a signal your team compensates with extra discipline, not that the code is healthy. Conversely, if coverage drops but velocity increases, you may be pruning low-value tests and focusing on delivery.
Glue's approach surfaces these correlations by making code health visible alongside delivery metrics. When product teams see that modules with high complexity also create the most production incidents and take 3x longer to change, the quality metrics stop being abstract numbers and become statements about product velocity.
Consider a fintech platform managing transaction processing. The team tracks cyclomatic complexity across modules and finds that the settlement reconciliation module has a complexity score of 67 (extremely high). They also notice that this module accounts for 40% of post-deployment bugs despite representing only 12% of the codebase. Historical incident data shows it takes an average of 18 days to fix bugs in this module, versus 3 days elsewhere.
Here's where correlation matters: the team's test coverage for the settlement module is 89%, higher than the platform average of 76%. But the coverage gap isn't the core problem. The real issue is that the high complexity creates too many execution paths to test feasibly, and the coverage metrics mask this. When the team refactors the settlement logic to reduce complexity to 34, the same test count now covers 98% of the important execution paths, and bug-fix time drops to 5 days within two months.
Start by anchoring metrics to business outcomes, not vanity numbers. A three-step approach: (1) pick one business outcome to optimize (deployment frequency, time-to-fix, incident rate), (2) identify the code metrics that actually correlate with it for your team, and (3) treat those metrics as signals, not targets.
For complexity measurement, cyclomatic complexity (decision point count) is easier to compute but cognitive complexity (how hard is it for humans to reason about) is more predictive of bugs. Many teams now use cognitive complexity as the primary signal. Set thresholds per module type - business logic can tolerate more complexity than security-critical or utility code, but make these thresholds explicit and based on your incident history, not industry benchmarks.
Defect density (bugs per 1,000 lines of code) is valuable only if you track which bugs matter. A team with high defect density in logging utilities may still have zero production incidents, while a small spike in reconciliation logic defect density triggers customer-facing outages. Bucket metrics by risk zone.
Test coverage is notoriously gameable. Instead of optimizing for coverage percentage, measure gap coverage - what are the untested execution paths in your riskiest modules? A team that maintains 70% coverage of high-risk code is executing better than one with 95% coverage of the entire codebase if most high-risk code is untested.
Misconception 1: Higher test coverage always means higher quality. Correction - coverage measures code reachability, not the quality of assertions or the relevance of tests to actual use cases. A test suite that executes 95% of your code paths but doesn't validate core invariants is a false positive. Measure assertion density (how many assertions per line of test code?) and test defect escape rate (bugs found in production divided by bugs found in QA) instead.
Misconception 2: Code quality metrics are tools for performance management. Correction - using metrics to evaluate individual engineer productivity creates perverse incentives and reduces code quality. Metrics should inform architectural decisions, not engineer review cycles. Teams that avoid this pitfall see better adoption of quality practices.
Misconception 3: You should pick the industry standard metrics. Correction - the metrics that matter are specific to your codebase, risk profile, and team. A team building financial infrastructure needs different metric thresholds than a team building content platforms. Use static analysis tools to capture breadth of signals, but customize what you care about.
Q: We have high test coverage but still ship bugs. What's wrong? Coverage metrics measure which lines of code are executed by tests, not whether those tests validate the right behavior. You likely have coverage in low-risk code and untested execution paths in complex modules. Profile your production bugs by module and check which had test coverage - you'll find the blind spots quickly.
Q: Should we set a minimum complexity threshold for refactoring? Yes, but the threshold should be based on your incident history. Calculate the average complexity of modules that produced your three worst production incidents last quarter. That complexity score becomes your refactoring trigger for preventive work.
Q: How often should we review code quality metrics? During sprint planning and monthly, correlate recent code quality changes with deployment frequency and incident rate changes. If metrics are trending in opposite directions (coverage up, velocity down), that's a signal to investigate.
Keep reading
Related resources