Caching, Idempotency, and Retries: The Three Things That Break at Scale

Three patterns separate systems that survive scale from systems that get paged at 3am. Cache invalidation and the stampede problem, idempotency keys done right, and retries with exponential backoff, jitter, and circuit breakers — plus how the three fit together into one reliability story. Get them right and most of your 3am pages quietly disappear.

May 27, 202612 min read

Caching, Idempotency, and Retries: The Three Things That Break at Scale

✦

Key Takeaway

Three patterns separate systems that survive scale from systems that get paged at 3am: caching (do less work), idempotency (make repeats safe), and retries (recover from transient failure). Each is "simple" until traffic, concurrency, and partial failure turn it into a foot-gun — stale caches and stampedes, double-charges from naive retries, retry storms that turn a blip into an outage. This is the field guide: cache invalidation and the stampede problem, idempotency keys done right, and retries with backoff, jitter, and circuit breakers. Get these three right and most of your reliability problems quietly disappear.

Caching, Idempotency, and Retries: The Three Things That Break at Scale

There's a moment every system has on its way up. It works perfectly at 100 requests a minute. It works fine at 1,000. Then somewhere around 10,000 it starts doing things that make no sense: a dashboard that's randomly stale, a customer charged twice for one click, a brief downstream blip that somehow becomes a 40-minute full outage.

Almost every time I've been pulled into one of these, the root cause traces back to one of three patterns done naively: caching, idempotency, or retries. They're the load-bearing walls of a scalable system. Each looks trivial in a code review. Each has a failure mode that only appears under real concurrency and real partial failure — exactly the conditions you can't reproduce on your laptop.

Here's what I've learned about all three, in the order they tend to bite.

1. Caching: doing less work, correctly

Caching is storing the result of expensive work so you don't redo it. It's the highest-leverage performance tool you have — and the source of one of the two genuinely hard problems in computer science (naming, cache invalidation, and off-by-one errors).

The performance win is obvious. The traps are not.

The hard part is invalidation

A cache is a copy of the truth, and copies go stale. The entire difficulty of caching is deciding when a cached value is no longer trustworthy. Three broad strategies, in increasing order of correctness and effort:

TTL (time-to-live): the value expires after N seconds. Simple, and good enough for data that can be slightly stale (a product listing, a leaderboard). The cost: a window where you're serving old data. Pick the TTL by asking "how stale can this be before someone's upset?"
Write-through / write-around: update or invalidate the cache when the underlying data changes. More correct, more coupling — every writer must know to touch the cache.
Event-based invalidation: publish a "this changed" event and let cache holders evict. Scales to many caches but adds the event-driven machinery.

The mistake I see most: caching data that must be fresh (a user's account balance, permissions) with a TTL, then being surprised when someone sees the wrong number. Match the strategy to how much staleness the data can tolerate. Some data shouldn't be cached at all.

The stampede (the failure mode nobody tests)

Here's the one that takes systems down. A popular cache key expires. At that exact instant, 5,000 in-flight requests all miss the cache simultaneously, and all 5,000 hammer the database to recompute the same value. The database, sized for a trickle of cache misses, falls over. The outage you just caused is worse than having no cache at all.

Defenses:

Request coalescing / single-flight: only the first request recomputes; the rest wait for that result. One database hit instead of 5,000.
Staggered/jittered TTLs: don't let a whole class of keys expire at the same second.
Stale-while-revalidate: serve the slightly-stale value while one background task refreshes it. Nobody waits, the database sees one recompute.

A cache without stampede protection is a loaded gun pointed at your database, and the trigger is "a popular item." Design for the miss, not just the hit.

2. Idempotency: making repeats safe

An operation is idempotent if doing it twice has the same effect as doing it once. At scale — with retries, at-least-once message delivery, and impatient users double-clicking "Pay" — repeats are not an edge case. They are guaranteed. Idempotency is what makes them harmless.

The canonical disaster: a user clicks "Pay," the request is slow, they click again (or the client retries on timeout), and the card is charged twice. The fix is an idempotency key.

How it works in practice (this is roughly how Stripe and every serious payments API do it):

The client generates a unique key per logical operation and sends it (e.g., an Idempotency-Key header).
The server checks if it's seen that key. If yes, it returns the stored result of the first attempt — without doing the work again.
If no, it does the work, stores the result against the key, and returns it.

The details that matter:

Reserve the key before doing the work, atomically, to handle two concurrent requests with the same key (a real race under double-clicks). A unique constraint in the database is your friend here.
Store the response, not just "done" — so the retry gets the same answer, not a confusing "already processed" error.
Scope the key's lifetime. Keys don't need to live forever; a 24-hour window covers retries without unbounded storage.

Idempotency is the single highest-value reliability pattern for any system that handles money, sends messages, or mutates important state. If you take one thing from this article, make your write endpoints idempotent.

3. Retries: recovering from transient failure — without making it worse

Networks blip. A downstream is briefly overloaded. A node restarts. These transient failures are recoverable — if you retry. But naive retries are how a small problem becomes a large one.

The retry storm

A downstream service slows down. Every caller's request times out. Every caller immediately retries. The downstream — already struggling — now gets double the traffic, slows further, times out more, triggers more retries. This positive feedback loop is a retry storm, and it turns a 5-second blip into a sustained outage. The retries don't help recovery; they prevent it.

Doing retries right

Three rules, each one defusing part of the storm:

Exponential backoff: wait longer between each attempt (1s, 2s, 4s…). Give the downstream room to recover instead of piling on.
Jitter: add randomness to the backoff. Without it, all callers retry at the same moment (the "thundering herd"), recreating the stampede. Jitter spreads them out. Backoff without jitter is only half a fix.
A retry budget / cap: limit total attempts. Infinite retries on a permanently-broken downstream is just a slow-motion DoS on yourself.

Only retry what's safe to retry

This is where the three patterns connect: you can only safely retry an operation that's idempotent. Retrying a non-idempotent "charge the card" gives you the double-charge from section 2. Retrying an idempotent operation (or one protected by an idempotency key) is free. Retries and idempotency are a matched pair — retries are the recovery mechanism, idempotency is what makes recovery safe.

Circuit breakers: stop hitting the thing that's down

When a downstream is clearly failing (not just one blip), keep trying and you waste resources and prolong its pain. A circuit breaker watches the failure rate and, when it crosses a threshold, "opens" — failing fast for a cooldown period instead of attempting calls that will fail. Then it lets a trickle through to test recovery before closing again.

The breaker is what stops a single sick dependency from consuming all your threads and taking your whole service down with it. Combined with backoff + jitter, it's the difference between "a downstream had a bad five minutes" and "we had an incident."

How the three fit together

These aren't three separate topics — they're one reliability story:

Caching reduces load so you fail less often in the first place.
Idempotency makes the repeats (from retries, redeliveries, double-clicks) safe.
Retries + circuit breakers recover from the failures that do happen, without amplifying them.

A system missing any one of them has a predictable failure: no stampede protection → cache-expiry outages; no idempotency → duplicate-side-effect corruption; no backoff/breakers → retry-storm cascades. Get all three right and the 3am pages mostly stop.

What to do Monday morning

Find your hottest cache key and ask what happens when it expires. If the answer is "thousands of requests hit the database at once," add single-flight or stale-while-revalidate before it bites you.
Audit your money/message/state-mutating endpoints for idempotency keys. Any write that could be retried or double-submitted needs one. This is the highest-ROI fix on the list.
Check your retry logic for backoff AND jitter. Immediate retries and backoff-without-jitter both cause thundering herds. And confirm you only retry idempotent operations.
Add a circuit breaker to your most critical downstream dependency. Test it: when that dependency is down, does your service fail fast and stay up, or does it hang and take you with it?

Key takeaways

Caching's hard part is invalidation, and its deadly failure is the stampede. Match the invalidation strategy (TTL, write-through, event-based) to how stale the data can tolerably be, and protect hot keys with single-flight, jittered TTLs, or stale-while-revalidate. Some data shouldn't be cached at all.
Idempotency turns guaranteed repeats into non-events. Use idempotency keys: reserve the key atomically before the work, store the response, and return the original result on replay. It's the top reliability pattern for anything touching money, messages, or important state.
Naive retries cause outages; correct retries prevent them. Exponential backoff with jitter and a retry cap defuse the retry storm. Only ever retry operations that are idempotent.
Circuit breakers stop one sick dependency from killing your whole service. Fail fast when a downstream is clearly down, then test for recovery — instead of hanging and exhausting your resources.
The three are one story. Caching reduces failures, idempotency makes repeats safe, retries + breakers recover without amplifying. Missing any one produces a specific, predictable outage.

Your next step

Pick the single most important write operation in your system — the one where a duplicate would be most expensive (a charge, an order, a transfer). Trace exactly what happens if the client submits it twice. If you can't prove it's safe, you've found your highest-priority fix, and it's the same idempotency-key pattern every payments API relies on. Reliability at scale isn't one big thing; it's these three small things, done right, everywhere they matter.

Frequently asked questions

What is a cache stampede and how do I prevent it?

A cache stampede (or "dogpile") happens when a popular cached value expires and many concurrent requests all miss the cache at once, then all recompute the same value by hitting the database simultaneously — often overwhelming it and causing an outage worse than having no cache. Prevent it with request coalescing / single-flight (only the first request recomputes, the rest wait for its result), jittered or staggered TTLs so a class of keys doesn't expire at the same instant, and stale-while-revalidate (serve the slightly stale value while one background task refreshes it).

How do idempotency keys work?

An idempotency key is a unique identifier the client attaches to a logical operation (commonly via an Idempotency-Key header). The server checks whether it has seen that key: if so, it returns the stored result of the first attempt without redoing the work; if not, it performs the work once, stores the result against the key, and returns it. To handle concurrent duplicates safely, reserve the key atomically (e.g., with a unique database constraint) before doing the work, and store the actual response so retries get the same answer. This is how payment APIs prevent double-charges.

What's the right way to implement retries?

Use exponential backoff (increasing the wait between attempts, e.g., 1s, 2s, 4s) combined with jitter (randomness added to the delay so callers don't all retry at the same moment and create a thundering herd), plus a cap on total attempts. Critically, only retry operations that are idempotent or protected by an idempotency key — retrying a non-idempotent operation like a card charge can cause duplicates. Pair retries with a circuit breaker so you stop calling a dependency that is clearly down.

What is a circuit breaker?

A circuit breaker is a resilience pattern that monitors the failure rate of calls to a dependency and, when failures exceed a threshold, "opens" to fail fast for a cooldown period instead of making calls that are likely to fail. After the cooldown it enters a half-open state, allowing a few test calls; if they succeed it closes and resumes normal traffic, and if they fail it reopens. This prevents a single failing dependency from consuming all your threads or connections and cascading into a full outage of your own service.

Can I retry any failed operation safely?

No — you can only safely retry operations that are idempotent, meaning performing them more than once has the same effect as performing them once. Retrying a non-idempotent operation, such as charging a payment or sending a message, can produce duplicates. Either design the operation to be naturally idempotent or protect it with an idempotency key so the server deduplicates repeats. This is why idempotency and retries are a matched pair: retries provide recovery, and idempotency makes that recovery safe.

#software-architecture#caching#idempotency#retries#resilience#scalability#circuit-breaker#distributed-systems#reliability

Ruchit Suthar

15+ years scaling teams from startup to enterprise. 1,000+ technical interviews, 25+ engineers led. Real patterns, zero theory.

Caching, Idempotency, and Retries: The Three Things That Break at Scale

Caching, Idempotency, and Retries: The Three Things That Break at Scale

1. Caching: doing less work, correctly

The hard part is invalidation

The stampede (the failure mode nobody tests)

2. Idempotency: making repeats safe

3. Retries: recovering from transient failure — without making it worse

The retry storm

Doing retries right

Only retry what's safe to retry

Circuit breakers: stop hitting the thing that's down

How the three fit together

What to do Monday morning

Key takeaways

Your next step

Frequently asked questions

What is a cache stampede and how do I prevent it?

How do idempotency keys work?

What's the right way to implement retries?

What is a circuit breaker?

Can I retry any failed operation safely?

Continue Reading

Scaling to Millions of Users: A Real-World Architecture Teardown

Event-Driven Architecture Without the Hype: When Queues Help and When They Hurt

Multi-Tenant SaaS Architecture: Shared, Siloed, or Hybrid?