Glossary
Tribal knowledge is information that exists only in people's heads, not systems. Learn why it's a product risk and how to identify it.
At UshaOm with 27 engineers, tribal knowledge was our biggest risk. When two senior engineers left in the same month, we lost institutional context that took six months to rebuild.
Tribal knowledge is information held by a small group of people (often just one) that isn't documented or shared. It lives in people's heads, not in systems or documents. When the knowledge-holders leave, the knowledge goes with them.
Tribal knowledge is distinct from knowledge silos. A silo is knowledge deliberately kept close. Tribal knowledge is usually accumulated accidentally - information that builds up over time as people work on systems, but never gets documented.
Examples: "Why is the payment API structured this way? Only Bob knows." "What's the full architecture of the backend? Only Sarah understands it." "How do we deploy to production safely? Only the DevOps team remembers the gotchas."
Research consistently shows that tribal knowledge is one of the most expensive hidden costs in software engineering:
These numbers explain why tribal knowledge is not just a nice to have documentation problem — it's a structural risk that directly impacts product velocity, team resilience, and engineering costs.
Here are concrete examples of tribal knowledge that engineering teams encounter daily:
Example 1: The deployment gotcha. Every time we deploy to production, someone needs to manually clear the Redis cache for the notification service, or push notifications fail silently for 30 minutes. Only two people know this. This is tribal knowledge. It should be in a runbook.
Example 2: The architectural decision. We use a custom message queue instead of Kafka because three years ago we had a compatibility issue with our ORM layer. The engineer who made that decision left last year. Without documentation of why, the team either maintains a suboptimal system or risks repeating a known failure.
Example 3: The customer workaround. Enterprise customer Acme Corp has a custom integration that bypasses our standard API auth. Only the solutions architect knows the details. When that person is unavailable, support requests from Acme Corp become emergencies.
Example 4: The test suite. The integration tests fail randomly on Tuesdays because of a cron job that resets the staging database. Just re-run them. This tribal knowledge wastes hours per week across the team.
Example 5: The performance cliff. Dont query the analytics table with more than 10,000 rows in a single batch or the database locks up. Use pagination with a batch size of 5,000. If this isn't documented, someone will eventually trigger a production outage.
Tribal knowledge creates four problems:
First, it creates bottlenecks. When you need to know something, you ask the knowledge-holder. They're in a meeting, on vacation, or swamped. You're blocked.
Second, it creates execution risk. If a knowledge-holder leaves, that knowledge leaves with them. The team might not know critical information: why decisions were made, what gotchas exist, what approaches were tried before.
Third, it reduces psychological safety. People who don't have access to important knowledge feel dependent and anxious. "I don't know how the system works, and the person who does is always busy."
Fourth, it slows onboarding. New team members can't answer questions by reading documentation. They have to ask people. Onboarding takes longer.
Pressure to ship. When there's urgency, documentation isn't done. "We'll document later." Later never comes.
Distributed knowledge. Early employees build systems. Knowledge is distributed across their minds. As the team grows, new people ask questions. Rather than documenting, knowledge is transferred verbally.
Lack of documentation culture. If the organization doesn't value documentation, individuals don't invest in it.
Undocumented decisions. Important decisions are made (why this architecture? why this trade-off?). The decision isn't documented. Years later, someone asks "why?" and no one remembers.
Complex systems. Some knowledge is hard to document. "Here's the payment system architecture" is one page. "Here's how payment timing works across 5 systems with eventual consistency" is 20 pages. Making something that comprehensive takes time.
Tribal knowledge is expensive:
Maintenance costs: Explaining things over and over. Someone is always explaining the system to someone new.
Onboarding costs: New hires ramp slower. Without documentation, they learn slower.
Risk: If a key person leaves, institutional knowledge walks out the door.
Consistency: Without documentation, different people might explain things differently. Inconsistent information spreads.
Scaling: As the team grows, scaling knowledge transfer becomes impossible. "Tell everyone how this works" doesn't scale beyond maybe 5-10 people.
Tribal knowledge is invisible by definition. If it were visible, it would not be tribal. Here are practical methods to surface it:
The what-if test. For each system or process, ask: What would happen if person X was unavailable for a month? If the answer is the team would be stuck, that is tribal knowledge.
The onboarding audit. Track what new hires ask about most. The questions they repeatedly ask reveal what is undocumented. Common patterns: How does deployment work? Why is this service structured this way? Where do I find the config for X?
The incident post-mortem. After every incident, ask: Did we need to contact a specific person to resolve this? If yes, that person holds tribal knowledge about that system.
The code review bottleneck. If certain PRs can only be reviewed by specific people because they are the only ones who understand that code, that is tribal knowledge manifesting as a process bottleneck.
The meeting dependency. If decisions regularly stall because we need to wait for a specific person to weigh in on a system, that person holds tribal knowledge that is blocking the team.
The documentation gap analysis. List your top 10 critical systems. For each, ask: Could a competent engineer debug a production issue using only the existing documentation and code? If not, the gap is tribal knowledge.
1. Prioritize documentation. Make it explicit: "Documenting is part of your job." Allocate time. Make it valued (mention it in reviews).
2. Write things down. Not comprehensive documentation (that's expensive). Focused docs: "Here's how this system works." "Here's the architecture." "Here's what I wish I'd known when I started." 10 focused pages beats 100 pages of comprehensive documentation no one reads.
3. Build a knowledge repository. Confluence, Notion, or even Google Docs. Make it searchable. Make it easy to find.
4. Knowledge transfer rituals. When someone leaves: knowledge transfer meeting. Not "tell us everything." Rather: "What's critical for someone else to know?" Document that.
5. Pair on learning. Pair a new team member with someone experienced. As they work together, they learn. Encourage them to document what they learn.
6. Reduce dependencies on individuals. If knowledge is held by one person, make spreading it a priority. "Over the next month, Sam will teach three other people the payment system."
7. Update documentation with code changes. When code changes, documentation should change too. Make it part of the PR review: "Did documentation need updating?"
Some knowledge doesn't need to be documented:
Personal preferences: "I like to use Vim, you might like VS Code." Doesn't need documenting.
Individual expertise: "Sarah is great at debugging payment issues." That's fine. Sarah's individual skill is valuable. But if Sarah is the only one who understands the payment system, that's tribal knowledge that's problematic.
Temporary information: "We're using this tool for Q3." Doesn't need extensive documentation.
The line: If multiple people need it to do their jobs, document it.
Specialization is good. Sarah is the best payment engineer. That's valuable.
Tribal knowledge is problematic. Sarah is the only person who understands the payment system.
The difference: A specialist documents their work and mentors others. Knowledge spreads, but they remain valuable because of their expertise. Tribal knowledge concentrates information dangerously.
"Tribal knowledge is just a documentation problem." Partly. But also a culture and process problem. If your culture doesn't value documentation, and your process doesn't allocate time for it, documentation won't happen.
"Good engineers document as they code." Some do. Others need explicit encouragement and time allocation. Don't assume it happens naturally.
"We're too small to need documentation." Smallest teams have the most tribal knowledge risk. "There are only 3 of us, if one leaves, we're in trouble." Documentation is insurance.
"Tribal knowledge is unavoidable in fast-moving teams." You can reduce it. It requires: allocating time, setting expectations, and valuing it. Fast-moving teams that skip documentation are playing with fire.
Q: What's the minimum documentation needed?
A: System overview (what systems exist, what they do, how they connect). Key architectural decisions and why. Known gotchas. How to run/deploy. How to debug when things break. That's the core.
Q: Who should document?
A: The person who knows it best. They understand the nuance. But have someone else read it and feedback. If only they understand it, it might be too complex.
Q: How do we get people to document?
A: Make it expected (it's in the process), make it easy (provide templates), make it valued (recognize it in reviews), and allocate time (it's not something they do "on top of" other work).
Q: What if we have too much tribal knowledge to document everything?
A: Start with what matters most. What systems are most critical? What knowledge is most at-risk (held by people likely to leave)? Document that. Over time, expand.
Keep reading
Related resources