Database Design Mistakes That Will Haunt You at Scale (and How to Avoid Them Early)
That user_id column you added in 5 minutes now blocks sharding, mergers, and compliance. Database mistakes hide quietly then surface when you're growing fast. Learn the 6 mistakes (God Tables, wrong IDs, missing indexes, mixed workloads) and the pre-launch checklist.

TL;DR
Database design mistakes hide quietly and surface at scale—sequential IDs block sharding, god tables resist changes, missing indexes kill performance, and wrong data types prevent migrations. A 5-minute decision (like user_id as integer) can become a 3-month migration project. Use UUIDs, normalize properly, index strategically, and plan for immutability.
Database Design Mistakes That Will Haunt You at Scale (and How to Avoid Them Early)
The Column You Can't Change
Three years ago, you added a user_id column to your orders table. Simple integer, auto-increment, done in 5 minutes.
Today, that column blocks everything. You need to shard your database by region for GDPR compliance, but user_id is sequential across regions. You want to merge two databases after an acquisition, but ID collisions are everywhere. You're leaking information through predictable IDs in URLs.
Changing it means:
- Migrating 500 million rows.
- Updating 47 foreign key relationships.
- Coordinating deploys across 12 services.
- Downtime or complex dual-write migration.
- 3 months of engineering time.
"It's just a field" became "it's the hardest thing to change in our system."
Database design mistakes don't fail fast. They hide quietly and surface at the worst possible time—when you're growing fast, when compliance deadlines loom, when you're trying to merge systems.
The good news: most painful mistakes are avoidable with small upfront decisions. Let's talk about the ones that will haunt you.
Mistake #1: Overloading One Table or Entity with Everything
The "God Table" Problem
You start with a simple users table. Then you add columns for preferences. Then billing info. Then analytics tracking. Then internal notes. Then feature flags. Then...
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255),
password_hash VARCHAR(255),
first_name VARCHAR(100),
last_name VARCHAR(100),
-- Preferences
theme VARCHAR(20),
language VARCHAR(10),
timezone VARCHAR(50),
email_notifications BOOLEAN,
push_notifications BOOLEAN,
-- Billing
stripe_customer_id VARCHAR(255),
billing_email VARCHAR(255),
plan VARCHAR(50),
mrr DECIMAL(10,2),
trial_ends_at TIMESTAMP,
-- Analytics
last_login_at TIMESTAMP,
login_count INT,
last_seen_ip VARCHAR(45),
utm_source VARCHAR(100),
utm_campaign VARCHAR(100),
-- Admin
internal_notes TEXT,
is_beta_tester BOOLEAN,
feature_flag_1 BOOLEAN,
feature_flag_2 BOOLEAN,
-- ...50 more columns
);
Symptoms
- Massive rows: Most queries only need 5 fields but fetch 70.
- Write contention: Analytics updates and billing updates block each other.
- Unclear ownership: Multiple teams touch the same table, schema changes cause conflicts.
- Index hell: Need different indexes for different access patterns, slows writes.
Better Pattern: Bounded Contexts
Split by domain concern:
-- Core identity
CREATE TABLE users (
id UUID PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL
);
-- User preferences (owned by frontend team)
CREATE TABLE user_preferences (
user_id UUID PRIMARY KEY REFERENCES users(id),
theme VARCHAR(20),
language VARCHAR(10),
timezone VARCHAR(50),
email_notifications BOOLEAN DEFAULT TRUE,
push_notifications BOOLEAN DEFAULT TRUE
);
-- Billing (owned by billing team)
CREATE TABLE customer_billing (
user_id UUID PRIMARY KEY REFERENCES users(id),
stripe_customer_id VARCHAR(255),
plan VARCHAR(50),
mrr DECIMAL(10,2),
trial_ends_at TIMESTAMP
);
-- Analytics (owned by data team, maybe different DB)
CREATE TABLE user_analytics (
user_id UUID PRIMARY KEY,
last_login_at TIMESTAMP,
login_count INT,
last_seen_ip VARCHAR(45)
);
Now:
- Each team owns their tables.
- Schema changes don't conflict.
- You can move analytics to a separate read replica or data warehouse.
- Queries fetch only what they need.
Guideline: If you catch yourself saying "this table just needs one more column..." stop. Ask: does this belong in a separate table?
Mistake #2: Wrong Primary Keys and Identifier Strategy
Auto-Increment IDs: Convenient, Limiting
CREATE TABLE orders (
id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT,
total DECIMAL(10,2)
);
Problems:
- Sharding nightmare: Auto-increment doesn't work across shards. You need a global ID generator.
- Information leakage: URLs like
/orders/12345reveal order volume and growth rate. - Merge conflicts: Merging databases (acquisitions, multi-region) causes ID collisions.
- Distributed systems: Multiple databases generating IDs independently will collide.
UUIDs: Better for Scale, Trade-Offs
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID,
total DECIMAL(10,2)
);
Benefits:
- Globally unique, no coordination needed.
- Can generate client-side or server-side.
- Safe to merge databases.
- Non-sequential, no information leakage.
Trade-offs:
- Larger: 16 bytes vs 4 bytes (INT) or 8 bytes (BIGINT).
- Index performance: Random UUIDs cause index fragmentation (mitigated with UUIDv7 or ULID—time-ordered UUIDs).
Hybrid Approach: UUIDs with Sequential Ordering
Use UUIDv7 (timestamp-based) or ULIDs (lexicographically sortable):
-- UUID v7 (timestamp + random)
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT uuid_generate_v7(),
...
);
Time-ordered UUIDs give you:
- Global uniqueness.
- Better index performance (sequential writes).
- Sortable by creation time.
Recommendation: Default to UUIDs (preferably v7/ULID) for primary keys unless you have a specific reason not to. Save yourself future migration pain.
Mistake #3: Ignoring Indexes Until It's Too Late
The Slow Query You Didn't See Coming
Your app launches. 1,000 users. Queries are fast. No indexes except primary keys? No problem.
Two years later: 10 million users. Query that finds orders for a user takes 15 seconds. Your database is melting.
-- No index on user_id
SELECT * FROM orders WHERE user_id = 12345;
-- Full table scan of 50 million orders
Missing Indexes
Symptom: Queries on non-primary-key columns are slow.
Fix: Add indexes on columns you query frequently.
CREATE INDEX idx_orders_user_id ON orders(user_id);
Now the query uses the index and returns in 20ms.
Wrong Indexes: Over-Indexing
Every index speeds up reads but slows down writes. Each write updates all indexes.
Anti-pattern: Index every column "just in case."
Result: Inserts and updates become slow. Indexes take up huge disk space. Half the indexes are never used.
Better: Index based on actual query patterns.
Composite Indexes
If you query by multiple columns together, use a composite index:
-- Query: WHERE user_id = X AND status = 'pending'
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
Order matters: Put the most selective column first (usually the one in equality checks).
Guideline:
- Index foreign keys (used in joins).
- Index columns in WHERE clauses of frequent queries.
- Use composite indexes for multi-column queries.
- Avoid indexing low-cardinality columns alone (e.g.,
statuswith 3 values—index it with other columns).
Query-First Thinking
Don't design schema in isolation. Think:
- "How will I query this?"
- "Which columns will I filter by?"
- "What joins will I do?"
Then add indexes accordingly.
Mistake #4: Mixing Operational and Analytical Workloads on the Same DB
Reporting Queries Killing Production
Your CEO wants a dashboard: total revenue by month, top customers, order trends.
You write a query:
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(total) AS revenue
FROM orders
GROUP BY month;
-- Full table scan of 50 million orders
This runs for 45 seconds. Meanwhile, your production API timeouts spike because the database is thrashing.
Problem: OLTP (transactional) and OLAP (analytical) workloads have opposite needs.
- OLTP: Fast, small queries. Low latency. Indexed lookups.
- OLAP: Large scans, aggregations. High throughput. Don't care about latency.
Mixing them on the same database causes contention.
Better Patterns
1. Read Replica for Reports
Send analytical queries to a read replica (separate database instance, replicating from primary).
Primary DB → (replication) → Read Replica
↑ ↑
Production API Reporting Dashboard
Keeps heavy queries off production.
2. Separate Reporting Database / Data Warehouse
Periodically sync data to a data warehouse (Snowflake, BigQuery, Redshift) or separate PostgreSQL instance optimized for analytics.
Production DB → ETL Pipeline → Data Warehouse → BI Tools
Now analytics doesn't touch production at all.
3. Materialized Views (for smaller scale)
Pre-aggregate data into materialized views:
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(total) AS revenue
FROM orders
GROUP BY month;
-- Refresh periodically
REFRESH MATERIALIZED VIEW monthly_revenue;
Queries hit the pre-aggregated view, not raw data.
Guideline: Separate read-heavy analytical queries from write-heavy transactional queries as early as possible.
Mistake #5: Loose Handling of Time, Time Zones, and History
"When Did This Happen?"
CREATE TABLE orders (
id UUID PRIMARY KEY,
user_id UUID,
total DECIMAL(10,2)
-- No timestamps
);
Six months later, customer support asks: "When was this order placed?"
You have no idea. You add created_at now, but all existing orders are NULL or defaulted to today.
Missing Timestamps
Always include:
CREATE TABLE orders (
id UUID PRIMARY KEY,
user_id UUID,
total DECIMAL(10,2),
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
created_at: When the record was first created.updated_at: When it was last modified.
Use TIMESTAMP WITH TIME ZONE (or TIMESTAMPTZ in PostgreSQL) to store UTC:
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
This avoids ambiguity. Store in UTC, convert to local time in application.
No Audit Trail or History
A customer complains: "I was charged $100, but it should have been $80."
You look at the order: total = 80. But you don't know if it was always 80 or changed from 100.
Solution: Audit tables or history tracking
Option 1: Separate audit log
CREATE TABLE order_history (
id UUID PRIMARY KEY,
order_id UUID REFERENCES orders(id),
changed_at TIMESTAMPTZ NOT NULL,
changed_by UUID,
field_name VARCHAR(50),
old_value TEXT,
new_value TEXT
);
Option 2: Temporal tables (built-in versioning)
Some databases (PostgreSQL with extensions, SQL Server) support system-versioned temporal tables that automatically track history.
Option 3: Event sourcing
Store all changes as events:
CREATE TABLE order_events (
id UUID PRIMARY KEY,
order_id UUID,
event_type VARCHAR(50), -- 'created', 'updated', 'cancelled'
event_data JSONB,
created_at TIMESTAMPTZ NOT NULL
);
Rebuild current state by replaying events.
Guideline: If you care about "what changed when," design for history from the start. Adding it later is painful.
Mistake #6: No Plan for Growth and Partitioning
Tables That Grow Forever
CREATE TABLE logs (
id UUID PRIMARY KEY,
user_id UUID,
action VARCHAR(100),
created_at TIMESTAMPTZ
);
This table grows to 5 billion rows. Queries slow down. Indexes grow massive. Backups take 12 hours.
Strategies for Large Tables
1. Partitioning (by date)
Split the table into smaller chunks:
-- PostgreSQL partitioning
CREATE TABLE logs (
id UUID,
user_id UUID,
action VARCHAR(100),
created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);
CREATE TABLE logs_2024_11 PARTITION OF logs
FOR VALUES FROM ('2024-11-01') TO ('2024-12-01');
CREATE TABLE logs_2024_12 PARTITION OF logs
FOR VALUES FROM ('2024-12-01') TO ('2025-01-01');
Queries only scan relevant partitions. Old partitions can be archived or dropped.
2. Archiving old data
Move old rows to an archive table or separate storage (S3, data warehouse):
-- Move logs older than 1 year to archive
INSERT INTO logs_archive SELECT * FROM logs WHERE created_at < NOW() - INTERVAL '1 year';
DELETE FROM logs WHERE created_at < NOW() - INTERVAL '1 year';
Keeps active table small.
3. Shard-aware design (don't shard prematurely, but be ready)
If you'll eventually shard (split database across multiple servers), design with that in mind:
- Use shard keys that distribute data evenly (e.g.,
user_id,tenant_id). - Avoid queries that need to scan all shards (cross-shard joins are expensive).
Guideline: Don't shard on day 1. But design tables so sharding is possible when you need it. Use UUIDs, avoid cross-entity joins where possible, think about which queries need to be fast.
Design Principles That Save You Later
Beyond avoiding mistakes, here are positive principles:
1. Model Real-World Concepts Clearly
Your schema should reflect domain concepts, not technical abstractions.
Bad:
CREATE TABLE entities (
id INT,
type VARCHAR(20), -- 'user', 'order', 'product'
data JSONB
);
Everything is an "entity." Good luck writing clear queries.
Good:
CREATE TABLE users (...);
CREATE TABLE orders (...);
CREATE TABLE products (...);
Clear tables for clear concepts.
2. Capture History Explicitly When Needed
If you need to answer "what did this look like yesterday?" design for it upfront with created_at, updated_at, or audit tables.
3. Design with Read Patterns in Mind
Don't just model entities—model how you'll query them.
If you always fetch orders with their line items, consider whether denormalization or eager loading makes sense. If you filter by status and created_at together, create a composite index.
4. Document the Intent
Your schema is code. It deserves comments.
-- Orders table: user purchases
-- Sharded by user_id (for future partitioning)
CREATE TABLE orders (
id UUID PRIMARY KEY, -- UUIDv7 for global uniqueness and sequential indexing
user_id UUID NOT NULL, -- FK to users, shard key
total DECIMAL(10,2) NOT NULL, -- Order total in USD
status VARCHAR(20) NOT NULL, -- 'pending', 'paid', 'shipped', 'cancelled'
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Index for common query: user's orders by status
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
Future you (and future teammates) will thank you.
Closing: Schemas Are Architecture, Not Plumbing
Schema design is not just "where we put the data." It's core architectural work.
Bad schemas:
- Slow down every feature.
- Block scaling.
- Cost months of migration time.
Good schemas:
- Fade into the background.
- Support growth without drama.
- Make new features easy to build.
The difference is small, intentional decisions made early.
Pre-Launch Database Design Checklist
Before you launch (or before you add a major new table), review:
- Primary keys: Using UUIDs (or ULIDs/UUIDv7) for global uniqueness and scale?
- Timestamps: Every table has
created_atandupdated_at(with timezone)? - Indexes: Indexed foreign keys and columns in frequent WHERE clauses?
- Normalization: Avoiding God Tables—each table has clear, focused purpose?
- Audit trail: Do you need history/audit logs for compliance or debugging?
- Growth plan: How will this table grow? Do you need partitioning, archiving, or sharding awareness?
- Read patterns: Have you thought about how you'll query this? Composite indexes where needed?
- Documentation: Comments explaining key design choices and constraints?
Run this checklist. It takes 15 minutes. It saves months of pain.
Database design mistakes are quiet. They don't fail tests. They don't throw exceptions. They wait.
They wait until you have 100 million rows and need to add a column. They wait until you need to shard. They wait until you're merging two companies' databases.
Then they become the hardest, most expensive problems you have.
Design your schemas like they'll last 10 years. Because they will.
