Pull Request Size and Code Review Quality: Why Smaller PRs Actually Get Better Reviews
There's a clear empirical pattern in software engineering: large pull requests get worse reviews.
When a PR is two hundred lines, reviewers read it carefully. They understand the change. They spot bugs. When a PR is two thousand lines, reviewers skim it and click approve. This isn't laziness. It's cognitive capacity. The human brain can hold a certain amount of context. Beyond that, understanding degrades fast.
This matters because code review is one of the highest-leverage activities in engineering. Good reviews catch bugs before they reach production. Bad reviews miss issues that become incidents. PR size is the single biggest factor determining whether reviews are good or bad.
Yet most teams have no visibility into their PR size distribution. They just wonder why reviews are shallow and incidents keep happening.
What Is PR Size and Why It Matters
Pull request size is the number of lines of code changed in a single PR. Some count only added lines. Some count additions and deletions. Some count files touched. The exact metric varies, but the pattern is consistent.
Research from CodeClimate, Graphite, and other teams shows the same finding: up to 200 lines, review quality stays high. At 400 lines, quality degrades noticeably. Beyond 800 lines, reviews become perfunctory.
Why does this matter? Code review is probably the highest-ROI activity your team does. A thorough review catches bugs before they cause incidents. A shallow review misses issues that cost hours to fix in production. The difference between good and bad reviews is measured in actual money and customer impact.
Large PRs destroy review quality. This is one of the few engineering practices where the data is clean and the conclusion is unavoidable.
The Data on PR Size and Review Quality
Several organizations have published data on this.
CodeClimate's data, based on analyzing thousands of GitHub repositories, shows that median PR size is 197 lines of code. The best-performing teams keep PRs under 200 lines. When PRs exceed 400 lines, review time increases and defect rate increases.
Graphite's data is similar. They found that PRs under 200 lines get reviewed and merged significantly faster. The median time to merge goes from one to two days for small PRs to four to seven days for large PRs. And the larger PRs have more review cycles, meaning more friction.
Research on code review effectiveness, published in academic software engineering studies, shows that reviewer effectiveness drops sharply as PR size increases. At 200 lines, reviewers catch about 70% of defects. At 400 lines, they catch about 50%. At 800 lines and above, they catch about 30%.
This is not an opinion. This is not a preference. This is empirical data from thousands of real code reviews.
The reason is simple: context load. A reviewer can hold a PR in working memory if it's small. They understand what it's changing, why, and what the impact is. When a PR is huge, reviewers can't hold it all in their heads. They skim. They get tired. They approve to move on.
What Makes a Good PR
Good PRs share common characteristics.
They do one thing. A PR that adds a feature, refactors a module, and fixes a bug is doing three things. Split it. A PR that modifies files in five different areas might be doing one logical thing, but it's hard to review because the reviewer has to switch context five times.
They're explained. A good PR description explains why the change is needed, what problem it solves, what tradeoffs were considered. The reviewer can understand the intent, not just the code.
They're sized appropriately. Under 200 lines is ideal. Under 400 lines is acceptable. Above 400 lines needs a really good reason.
They're safe. The PR either passes all existing tests, or it modifies tests to reflect new behavior. It doesn't break things. If the change is risky, that risk is acknowledged in the description.
They're focused. Everything in the PR is needed for the stated goal. There's no random cleanup, no premature optimization, no side projects.
PR Size Best Practices
These practices keep PR sizes manageable.
Aim for under 250 lines. This is aggressive, but it's the sweet spot for review quality. If you consistently hit 250-line PRs, your reviews will be much better than teams hitting 500-line PRs.
Split by abstraction. If you're changing both the API and the implementation, split them. The API change in one PR, the implementation in another. This makes reviews faster because each PR is more focused.
Separate refactoring from feature work. Adding a feature that requires refactoring? Two PRs. First refactor the code to prepare for the feature. Then add the feature. Each PR is simpler and easier to review.
Break up large features. A feature that would be five hundred lines probably isn't a single unit of work. Break it into smaller, shippable chunks. Add the foundation in one PR. Build on it in the next PR.
Extract preparatory work. If your PR is large because of setup work (renaming, small refactors, type updates), pull that out into a separate PR. Submit it first. Then your feature PR is focused on the actual feature.
Use branches for collaboration. If multiple people are working on the same feature, use a feature branch for collaborative development. Keep individual PRs small for the main branch.
Why PRs Get Large
Most teams don't intentionally create large PRs. They happen because of root causes that aren't addressed.
Unclear scope. The task description is vague. The person starts coding. As they go, they discover what needs to change. They end up changing everything related. Better: clarify scope before coding.
Hidden dependencies. You think you're changing one module. But it depends on five other modules that also need changes. Suddenly your PR is huge. This is often a sign that your code has dependency issues.
Fear of rejection. Someone spends a week on something. They're nervous the reviewer will reject it. So they add more context, more explanation, more related changes. They want to make sure it passes. Ironically, the larger PR is more likely to be rejected.
Complex dependencies. Some codebases are architectured in a way that any change touches many files. This is not a PR size problem. It's a codebase architecture problem. The PR is large because the code is tightly coupled.
Unclear separation of concerns. When modules have unclear responsibilities, changes to one affect many. Refactor to clarify responsibilities, and PR sizes naturally shrink.
How to Split Large PRs
When you find yourself with a large PR, how do you break it up?
Extract preparatory work. Is there refactoring that the main change depends on? Pull it out first. Rename variables, extract methods, move code around. Do this in a separate PR that doesn't change behavior. Get it merged. Now the feature PR is smaller.
Use feature flags. Building a feature? Use a feature flag to hide it until it's done. This lets you break the work into multiple PRs. First PR adds the foundation. Second PR adds more. Third PR enables the flag.
Build incrementally. Instead of building the whole feature, build the minimum useful thing. Get it merged. Then add to it. Each PR is smaller.
Split by layer. If you're changing database, service, and API, do them in separate PRs. Database changes first. Then service. Then API. Each is reviewable in isolation.
Split by file. If you're touching many files, see if you can separate them. Core business logic changes in one PR, formatting changes in another.
PR Size Metrics and Tracking
Most teams don't measure PR size. If you're not measuring, you have no visibility into whether your reviews are actually good.
Measure these metrics:
- Average PR size in lines of code
- Median PR size
- Distribution of PR sizes (how many PRs are under 200 lines, 200-400, 400-800, etc.)
- PR review time
- Defect rate by PR size
Tools like Graphite, Code Climate, and git-based analytics can compute these automatically.
Track them over time. If your average PR size is growing, that's a signal. If 80% of your PRs are over 400 lines, your reviews are suffering.
Most teams find that once they start measuring, they make progress. Because the data is clear. Large PRs are worse reviews. Smaller PRs are better reviews. The team naturally adjusts once they see the data.
Codebase Complexity as Root Cause of Large PRs
Here's what most teams miss: large PRs often aren't the real problem. They're a symptom.
When code is well-architected, changes tend to be localized. You modify one module or one service. A small PR. When code is tightly coupled, any change affects many modules. Suddenly PRs are large.
This is where codebase intelligence matters. You can see why PRs are large by understanding the dependency structure. If PRs that touch Module A tend to be huge, it's probably because Module A has dependencies everywhere, or everything depends on Module A.
Once you see that, you can fix the root cause: refactor to decouple modules. The benefit isn't just smaller PRs. It's faster feature development, fewer incidents, easier testing, easier refactoring.
Some of the highest-leverage investments a team can make are refactoring projects that reduce coupling. And the way to identify them is to look at PR size distribution and understand what's causing large PRs.
FAQ
Is there a minimum PR size?
Not really. If you can ship one-line changes, do it. The constraint is the maximum, not the minimum. One-line PRs are great.
What if my change legitimately requires a large PR?
Then it's a signal that your code is too coupled. Yes, you need to make the change. But after you do, consider refactoring to prevent this in the future.
Should we enforce maximum PR size?
Not as an absolute rule. Better to educate the team on the data and let them self-regulate. Most people want to write good code and do good reviews. Showing them the data changes behavior.
How do we handle large refactorings?
Use feature branches and multiple PRs. A large refactoring is often multiple small refactorings. Do them one at a time.
Does PR size matter for open source projects with volunteer reviewers?
Even more. Volunteer reviewers are even more capacity-constrained than paid reviewers. Small PRs are the only way to get good reviews.