How to Write Developer Documentation People Actually Read and Use
That 40-page doc no one opens wastes weeks. Good docs aren't comprehensive—they answer the right question in 10 minutes. Learn the 4 doc types (READMEs, architecture, runbooks, ADRs), templates, discoverability tactics, and the upgrade checklist.

TL;DR
40-page comprehensive docs don't get read. Good docs answer specific questions fast: Getting Started (30 min to productive), How-To Guides (solve specific problems), Architecture Decisions (why we chose this), and Runbooks (fix production issues). Keep docs close to code, update with changes, and optimize for human usage, not completeness.
How to Write Developer Documentation People Actually Read and Use
The 40-Page Doc No One Opens
Your team spent 3 weeks writing The
Complete Architecture Guide. Forty pages. Detailed diagrams. Every service documented. Posted to Confluence with celebration emojis.
Six months later:
- New engineers still ask "how do I run this locally?" (answered on page 23).
- Someone breaks production because they didn't know about the deployment process (documented on page 31).
- A critical system change happens. No one updates the doc. Now half of it is wrong.
The doc exists. No one reads it. So people ask the same questions in Slack.
This isn't a failure of discipline. It's a failure of design. 40-page docs don't get read because they're optimized for comprehensive coverage, not for human usage.
Good developer docs aren't about completeness. They're about answering the right questions at the right time. They help someone get unblocked in 10 minutes, not after reading a novel.
Let's talk about documentation people actually open, trust, and return to.
What Good Developer Docs Actually Do
Before writing a doc, ask: What job is this doc doing?
Good docs serve specific purposes:
1. Help Someone Get Started Quickly
Job: New engineer (or your future self in 6 months) needs to run this service locally, understand what it does, and make their first change.
Success: They're productive in 30 minutes without asking anyone for help.
2. Help Someone Debug or Change Something Safely
Job: Engineer hits an error, needs to understand a component, or wants to add a feature without breaking things.
Success: They find the relevant context, understand the constraints, and make the change confidently.
3. Preserve Decisions and Context Over Time
Job: Six months from now, someone asks "Why did we build it this way?"
Success: The doc explains the tradeoffs, alternatives considered, and why this choice made sense at the time.
Types of Docs You Need
Different jobs need different doc types:
- README / Getting Started: "How do I run this?" (for repos/services)
- Architecture Overview: "How does this system work?" (high-level)
- Runbooks: "Something broke, what do I do?" (on-call guides)
- Decision Records (ADRs): "Why did we choose this?" (preserving context)
Don't write one giant doc that tries to be all four. Write focused docs for specific jobs.
The Core Types of Docs You Need
1. README / Getting Started
Purpose: Get someone from zero to running the service locally in <30 minutes.
What it should contain:
- What this service does (2-3 sentences)
- Prerequisites (dependencies, tools, versions)
- How to run it locally (step-by-step)
- How to run tests
- Common issues and troubleshooting
- Links to deeper docs (architecture, deployment, API docs)
Example README structure:
# Order Service
Handles order creation, payment processing, and fulfillment workflows.
## Prerequisites
- Node.js 18+
- PostgreSQL 14+
- Redis 6+
## Running Locally
1. Install dependencies:
npm install
2. Set up database:
createdb orders_dev npm run migrate
3. Copy environment file:
cp .env.example .env
Edit `.env` and set `DATABASE_URL` and `REDIS_URL`.
4. Start the server:
npm run dev
Service runs at http://localhost:3000
## Running Tests
npm test
## Common Issues
**"Connection refused" when starting:**
Make sure PostgreSQL and Redis are running:
brew services start postgresql brew services start redis
**"Migration failed":**
Drop and recreate the database:
dropdb orders_dev && createdb orders_dev npm run migrate
## More Documentation
- [Architecture Overview](docs/architecture.md)
- [API Documentation](docs/api.md)
- [Deployment Guide](docs/deployment.md)
Key principles:
- Start with the goal ("run this locally")
- Be specific: exact commands, not "install the dependencies"
- Anticipate common failures: include troubleshooting
- Keep it short: If it's more than 2 screens, split it up
2. Architecture Overview
Purpose: Help someone understand the high-level system design without reading code.
What it should contain:
- System context: What problem does this solve? How does it fit in the larger system?
- Key components: Main services, databases, queues, caches
- Data flow: How requests move through the system
- Key design decisions: Why this architecture?
Focus on stable concepts, not implementation details.
Example architecture doc snippet:
# Order Service Architecture
## Context
Order Service handles the full order lifecycle: creation, payment, fulfillment, and cancellation. It's called by the Web/Mobile APIs and integrates with Payment Service and Fulfillment Service.
## High-Level Architecture
[Web/Mobile API] ↓ [Order Service] ↓ ↓ [Payment] [Fulfillment] ↓ [Database]
## Key Components
**Order API**: REST API for creating/updating orders.
**Order Processor**: Background worker that processes payment and fulfillment.
**PostgreSQL**: Stores order data.
**Redis**: Caches user cart data (TTL 24 hours).
**SQS Queue**: Async jobs for fulfillment notifications.
## Data Flow: Creating an Order
1. Client calls `POST /orders` with cart data.
2. Order Service validates inventory and creates order record (status: `pending`).
3. Order Service calls Payment Service to charge card.
4. If payment succeeds, order status → `paid`. Job enqueued to Fulfillment Service.
5. If payment fails, order status → `failed`. Client notified.
## Key Design Decisions
**Why async fulfillment?**
Fulfillment can take 5-30 seconds (warehouse API calls, label generation). We don't want to block the user's request. Orders are created immediately, fulfillment happens async.
**Why Redis for cart?**
Carts are high-write, low-durability (OK to lose if Redis restarts). Keeping them out of Postgres reduces write load.
**Why SQS instead of in-process jobs?**
Fulfillment jobs can fail and need retries. SQS gives us durability and visibility. We can also scale workers independently.
What to include:
- Text + simple diagrams (ASCII art, Mermaid, or simple boxes-and-arrows)
- Flows, not just static structures
- Rationale for non-obvious choices
What to skip:
- Implementation details (class names, file structure—that's in code comments)
- Details that change frequently (specific config values, IP addresses)
3. Runbooks: On-Call Guides
Purpose: Help someone debug and fix a production issue at 3am with minimal context.
Audience: Your future stressed-out self.
Structure:
- Symptoms: Alert name, error message, or user report
- Immediate steps: Checklist to diagnose and mitigate
- How to escalate: Who to contact if stuck
- Links: Dashboards, logs, deeper investigation docs
Example runbook:
# Runbook: High Order API Latency
## Symptoms
- Alert: "Order API p99 latency > 2 seconds"
- Users reporting slow checkout
## Immediate Checks
1. **Check dashboard**: [Order Service Metrics](https://grafana.company.com/orders)
- Look for: request rate spike, error rate, database query time
2. **Check database**:
```sql
-- Find slow queries
SELECT query, mean_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
Check Redis: Is cache hit rate lower than normal? (normal: >80%)
redis-cli INFO stats | grep keyspaceCheck Payment Service: Is payment API slow? (check #payment-alerts in Slack)
Common Causes & Fixes
Cause 1: Database Connection Pool Exhausted
Symptom: Logs show "connection pool timeout"
Fix: Restart the service (temporarily):
kubectl rollout restart deployment/order-service -n production
Follow-up: Increase connection pool size (requires deploy).
Cause 2: Redis Cache Miss Storm
Symptom: Redis hit rate <50%, database queries spiking
Fix: Warm the cache:
kubectl exec -it order-service-pod -n production -- npm run cache:warm
Cause 3: Payment Service Timeout
Symptom: Payment Service latency >5 seconds
Escalate: Page Payment team (#payment-alerts). Mitigation: Consider enabling circuit breaker (requires code change).
Escalation
- During business hours: Post in #order-service Slack channel
- After hours: Page on-call engineer:
@oncall-ordersin PagerDuty
Deeper Investigation
**Key principles**:
- **Start with what you see** (alert, error), not "if you have problem X..."
- **Checklists**, not paragraphs
- **Assume stress and low context**: step-by-step commands, not "check the logs"
- **Link to monitoring**: dashboards, log queries, metrics
### 4. Architecture Decision Records (ADRs)
**Purpose**: Document **why** key technical decisions were made, preserving context for future engineers.
**Format**: Short (1-2 pages), structured docs stored in the repo.
**When to write an ADR**:
- Choosing a database, message queue, or major technology
- Significant architecture changes (sharding, microservices split)
- Trade-offs that will affect the team for years
**Structure**:
```markdown
# ADR ###: [Title]
**Status**: Accepted | Rejected | Superseded
**Date**: YYYY-MM-DD
**Decision Owner**: Name
## Context
Why are we making this decision? What problem or constraint drove it?
## Decision
What did we decide?
## Alternatives Considered
What other options did we evaluate? Why did we reject them?
## Consequences
What are the benefits and trade-offs of this decision?
Example ADR:
# ADR 005: Use PostgreSQL JSON Columns for Order Metadata
**Status**: Accepted
**Date**: 2025-11-10
**Decision Owner**: Ruchit Suthar
## Context
Orders have flexible metadata (gift messages, custom engraving text, promo codes applied). Different order types have different metadata. Adding a column for each field bloats the schema and requires migrations for every new field.
## Decision
Store flexible metadata in a JSONB column: `orders.metadata`.
## Alternatives Considered
**Option 1: Separate `order_metadata` table**
Rejected: Requires JOIN on every order fetch. Complicates queries.
**Option 2: Add columns as needed**
Rejected: Frequent schema migrations. Unclear which fields are actually used.
**Option 3: NoSQL document store (MongoDB)**
Rejected: Rest of the system is PostgreSQL. Adds operational complexity. JSONB in Postgres gives us flexibility without leaving the RDBMS.
## Consequences
**Benefits**:
- No schema changes for new metadata fields.
- Can query JSON with PostgreSQL's JSON operators: `metadata->>'gift_message'`.
**Trade-offs**:
- Harder to enforce schema (no strong typing). Mitigated with application-level validation.
- Less efficient than indexed columns for high-frequency queries. Acceptable for metadata (queried infrequently).
**Migration plan**:
Existing metadata columns (`gift_message`, `promo_code`) will be migrated into `metadata` JSONB. Old columns deprecated in 3 months.
Benefits of ADRs:
- New engineers understand "why" instead of reverse-engineering from code
- Prevents re-litigating old decisions ("why didn't we use MongoDB?")
- Searchable history of technical choices
Designing READMEs That Unblock People in 10 Minutes
READMEs are your most important doc. They're the first thing anyone reads.
Template for a good README:
Section 1: What This Does (2-3 sentences)
# Service Name
One-sentence description of what this service does and who uses it.
Example:
Order Service handles order creation, payment processing, and fulfillment for all e-commerce purchases.
Section 2: Quick Start (How to Run Locally)
## Quick Start
1. Install dependencies: [exact command]
2. Set up database: [exact command]
3. Configure environment: [exact steps]
4. Run the service: [exact command]
Section 3: Testing
## Running Tests
[exact command to run tests]
## Running Specific Tests
[how to run one test file or test case]
Section 4: Common Issues
## Troubleshooting
**Error: "X"**
Cause: Y
Fix: Z
Section 5: Links to Deeper Docs
## Documentation
- [Architecture Overview](docs/architecture.md)
- [API Documentation](docs/api.md)
- [Deployment Guide](docs/deployment.md)
- [Runbook](docs/runbook.md)
Keep it short: If your README is >200 lines, split sections into separate docs and link them.
Making Docs Discoverable and Trusted
Even great docs are useless if no one can find them.
1. Standard Locations and Naming
Put docs where people expect them:
README.mdin repo root/docsfolder for deeper docs- Link from dashboards, alerts, and wikis
Use consistent naming:
architecture.mdrunbook.mdapi.mddeployment.md
2. Link from the Tools People Use
- Dashboards: Link runbooks from Grafana/Datadog alerts
- Error messages: Link troubleshooting guides from log messages
- Repos: Link architecture docs from README
3. Show Last-Updated Date and Owner
**Last Updated**: 2025-11-10
**Owner**: @ruchit (Slack: @ruchit)
This signals: "Is this doc current?" and "Who do I ask if it's wrong?"
4. Delete Outdated Docs
Zombie docs are worse than no docs. If a doc is outdated and no one will update it, delete it.
Better to have 5 accurate docs than 50 docs where 30 are wrong and no one knows which.
Closing: Documentation as Part of the Definition of Done
Documentation isn't bureaucracy. It's craftsmanship.
Good docs:
- Speed up onboarding
- Reduce support burden
- Preserve knowledge when people leave
- Prevent repeated mistakes
Bad docs (or no docs):
- Tribal knowledge locks in your head
- Same questions in Slack every week
- New engineers blocked for days
- Context lost forever when someone quits
Treat docs as part of shipping code. A feature isn't done until:
- README updated (if relevant)
- Runbook updated (if on-call needs to know)
- ADR written (if major decision)
Checklist: Upgrade Docs for One Service This Week
Pick one service/repo and improve its docs:
- README exists and covers: What it does, how to run locally, how to test, common issues
- Architecture doc exists (high-level system design, key components, data flows)
- Runbook exists (for production service): symptoms, immediate steps, escalation
- ADRs exist for major technical decisions (at least 1-2 key choices documented)
- Docs are discoverable: Linked from README, dashboards, and team wiki
- Docs show last-updated date and owner
- Outdated docs deleted (if any)
This takes 2-4 hours. It saves weeks of confusion.
The best docs are boring. They're short, focused, and answer one question clearly. They're updated when things change. They're linked from the places people actually look.
Write docs people will actually read. Your future self—and your teammates—will thank you.
