Knowledge Management

Status: Complete
Category: Management
Default enforcement: Advisory
Author: PushBackLog team

Summary

Knowledge management is the practice of systematically capturing, organising, sharing, and maintaining the information that engineers and teams need to do their work effectively. The failure mode is not ignorance — it is knowledge that exists but is inaccessible: buried in a Slack thread from two years ago, stored in one senior engineer’s head, or written in a stale wiki that nobody trusts. Effective knowledge management prevents silos, accelerates onboarding, reduces key-person risk, and makes the team resilient to turnover.

Rationale

Knowledge silos are a reliability risk

When a critical system is understood only by one engineer, that engineer is a single point of failure. Their absence — holiday, illness, resignation — creates an operational gap. Teams with good knowledge management have collective understanding: any team member can diagnose a production issue, deploy the service, or explain its data model. This is not achieved through documentation alone; it requires deliberate practices including pair programming, code review, written decisions, and shared runbooks.

Undiscoverable knowledge is the same as missing knowledge

The common failure is not that knowledge was never written — it is that it was written somewhere no one ever looks: a Jira comment from 2021, a Google Doc shared with one person, a Slack thread from the channel history. Effective knowledge management requires both content creation (writing it down) and discoverability (making it findable). A single entry point, search, and consistent structure matter as much as volume of content.

Guidance

The knowledge management stack

Layer	What lives here	Tooling	Maintenance
Code	Current system behaviour	Git repository	Automated
ADRs	Why decisions were made	Git repository (`/docs/adr/`)	Written with decisions
Runbooks	How to operate and recover	Git repository (`/docs/runbooks/`)	Updated with incidents
API docs	How to use interfaces	Co-located with code	Updated with code
Team wiki	How the team works, processes, meeting notes	Notion / Confluence	Maintained by team
Onboarding guide	How to get started	Git or wiki	Reviewed quarterly

Preventing knowledge silos

Pair programming and code review are the primary mechanisms for spreading knowledge across the team — not documentation. Documentation captures state; pairing transfers understanding and reasoning. See Pair & Mob Programming and Code Review Best Practices.

Bus factor is the minimum number of people who must be hit by a bus for a project to become incapable of proceeding. Track it:

# Bus Factor Audit (quarterly)

| Component      | Primary owner | Secondary (knows it well) | Bus factor |
|----------------|--------------|--------------------------|------------|
| Auth service   | Christy      | Marlene                  | 2          |
| Billing module | Todd         | -                        | ⚠️ 1      |
| Infra/Terraform| Marcus       | -                        | ⚠️ 1      |

Bus factor 1 is a risk. Plan pairing sessions or documentation sprints to address it.

Writing for discoverability

The most important choice is where to put information such that it will be found. Rules of thumb:

Operational procedures belong in the repository alongside the system they describe
Decision rationale belongs in ADRs linked from the relevant code or PR
Team processes (sprint ceremonies, escalation paths, on-call setup) belong in the team wiki
Temporary context (where are we on this project, what’s the current plan) belongs in Jira/Linear tickets and sprint docs — not in permanent knowledge stores

Avoid “general” wikis with unclear organisation. Every page needs a clear home; if it’s hard to decide where something goes, the structure is wrong.

Documentation maintenance cadence

Documentation that is never reviewed becomes incorrect over time. Define a maintenance cadence:

Document type	Review cadence	Trigger for out-of-cycle review
Runbooks	After every incident that uses them	Service change affecting the procedure
Onboarding guide	Quarterly	New hire feedbacks during onboarding
Architecture docs	When architecture changes	New ADR
Security procedures	Quarterly	Security incident or audit
Team wiki pages	Annually	”Document health check” sprint

Assign page ownership. A page with no named owner will not be maintained.

Onboarding as a knowledge management signal

A new engineer’s onboarding experience is the best test of knowledge management quality. If they can read the documentation and be productive in one week, the knowledge is accessible and current. If they need hours of guided help from a senior engineer, knowledge is concentrated and undocumented.

After every new hire onboarding, run a retrospective:

What was missing from the documentation?
What was confusing?
What took longer than it should have?

Treat onboarding friction as actionable findings. File tickets. Fix the docs.

Lightweight knowledge capture habits

Sustainable knowledge management is not about writing long documents. It is about making small captures habitual:

After debugging a non-obvious issue: write one paragraph in the runbook describing the symptom, root cause, and fix
After every architecture decision: write a one-page ADR
When explaining something on Slack for the second time: convert the explanation to a wiki page and link to it
When a new engineer asks a question you didn’t have an answer for: document it after you find the answer

The compounding effect of small, consistent captures is a well-documented team knowledge base.

Review checklist

Bus factor audit conducted quarterly — components with bus factor 1 have a mitigation plan
Runbooks exist for all production services and are updated after incidents
New engineer onboarding guide is reviewed and updated quarterly
Knowledge is stored in a consistent, discoverable location — not in personal notes, email, or Slack DMs
Every piece of documentation has a named owner responsible for its accuracy
Onboarding retrospective is conducted after every new hire

Knowledge Management

Knowledge Management

Tags

Summary

Rationale

Knowledge silos are a reliability risk

Undiscoverable knowledge is the same as missing knowledge

Guidance

The knowledge management stack

Preventing knowledge silos

Writing for discoverability

Documentation maintenance cadence

Onboarding as a knowledge management signal

Lightweight knowledge capture habits

Review checklist