Architecture Patterns in Production

The 30-Minute Architecture Review Checklist

A repeatable checklist to pressure-test any design in half an hour — yours or someone else's.

Article 7 of 710 minIntermediate
Architecture Patterns in Production
Key Takeaway

Most architecture reviews fail not because the design is bad but because the meeting has no structure. Someone presents, others ask questions, nothing gets decided. This checklist turns 30 minutes into a repeatable process with a documented output — Approved, Approved with conditions, or Needs rework. "We'll think about it" is not an output.


Why Reviews Sprawl

The typical architecture review goes like this: an engineer presents a design they have been thinking about for two weeks. The first question opens a thread. The thread opens three more. Ninety minutes in, the engineer has written down a lot of notes and the group has reached no decision. The engineer leaves to "go think about it." Two weeks later, the design is unchanged and the meeting is scheduled again.

This is not a people problem. It is a process failure with two specific causes.

The first: no structure. Without a defined sequence of concerns to work through, the review follows the energy in the room — whoever asks the loudest question sets the agenda. The failure modes nobody asked about stay unexamined.

The second: no defined output. A meeting without a defined outcome produces a conversation, not a decision. Engineers are accustomed to treating conversation as progress. It is not. A design that leaves a review without a recorded decision is a design in limbo — the team can't build against it and can't reject it.

The 30-minute constraint is not an optimization. It is the forcing function that eliminates both failure modes.

The 30-Minute Structure

The session has four segments. The times are not suggestions.

0–5 minutes: Context. The presenter explains the problem, the constraints, and the alternatives already considered. Reviewer role in this segment: silent. No questions. The presenter speaks without interruption. If the presenter runs past five minutes, the review has already surfaced a problem — the design has not been scoped tightly enough to explain in five minutes.

5–20 minutes: Checklist walk. One dimension at a time, in order. The reviewer drives. Freeform questions are parked. Each checklist item gets a status: addressed, explicitly accepted as a risk, or flagged for follow-up. This is not a grilling — it is a structured scan.

20–28 minutes: Decisions logged. The reviewer states the outcome: Approved, Approved with conditions, or Needs rework. Conditions and rework items are specific — "the data migration plan needs to address in-flight records during cutover" is a condition. "This needs more thought" is not. The engineer who owns each follow-up item is named in this segment.

28–30 minutes: Next steps. Who does what by when. If the outcome is Approved with conditions, the conditions get a due date before the next milestone. If the outcome is Needs rework, a return date is set before anyone leaves the room.

The Checklist

Use this list in order during the 5–20 minute segment. Do not skip items because they feel obvious — the most expensive failures are the ones that felt obvious.

1. Failure modes What are the top three ways this design fails under production conditions? Name them explicitly: latency spike under load, data loss on partial failure, cascade from upstream service, schema migration window longer than the allowable downtime. Has each been addressed in the design, or explicitly accepted as a known risk?

2. Data Where does the data live? Who owns the schema? What is the migration path if the schema needs to change after the service is in production? What happens to in-flight requests during a deployment? What happens to in-flight data during a rollback?

3. Scaling ceiling What is the scaling ceiling of this design? At what load does the first bottleneck appear, and what is it — the database, the message queue, the network boundary, the memory footprint of a single node? "It'll scale" is not an answer. A named bottleneck at a named load point is.

4. Security What are the trust boundaries? What data crosses them? Is personally identifiable information handled correctly at rest and in transit? Who has access to the data store in production — and is that access logged?

5. Reversibility Can this decision be reversed in six months if it turns out to be wrong? If the answer is no — because it involves a shared data model, a public API contract, or a migration that cannot be rolled back — why is the team confident enough to make it irreversible? What evidence supports that confidence?

6. Cost What does this cost to run at current scale? At 10x scale? Compute is cheap until it isn't. The time to discover that an architectural choice costs $40k/month at scale is before it is in production, not during a budget review six months later.

How to Run It as a Meeting

The reviewer is not a judge and not an approver. The reviewer is a pattern recognizer. The job is to surface failure modes the proposer has not considered — not to redesign the system in the room.

This distinction matters because reviewers who behave as designers slow everything down and demoralize the engineer whose work is being reviewed. "What if you used X instead of Y?" is a design question, not a review question. The checklist keeps the reviewer in pattern-recognition mode.

When a question opens a rabbit hole — and one always does — park it. "That's a good question. Let's add it as a follow-up item and keep moving." The parking lot is real: it gets written down with an owner and a date. It does not mean the question goes away. It means the question gets resolved after the meeting, not inside it.

Timebox discipline requires active enforcement. If the presenter is still in the context segment at minute seven, stop them. "We are past the context window. What is the decision you need from this session?" That question resets the room.

The Three Outputs

The meeting produces exactly one of three outcomes.

Approved. The design is sound. The risks are named and accepted. Build it.

Approved with conditions. The design is directionally right but has specific open items that must be resolved before a named milestone — not "before production" in the abstract, but before the service takes production traffic on a date. Each condition has an owner and a date.

Needs rework. Specific items are blocking. Named. Assigned. The presenter brings a revised design to a follow-up review within a defined window — typically one to two weeks.

"We'll think about it" is not an output. If the meeting ends without one of these three, the review failed. Run it again with stricter timekeeping.

If reviewers are reluctant to give a clear output, the problem is usually that they feel responsible for the design's success. They are not. The reviewer surfaces what they see. The engineer makes the call. The output records that the review happened and what was found. That distinction — between the reviewer's responsibility and the engineer's responsibility — is what makes the process sustainable.

The ADR Is the Output of the Approved Review

Every review that reaches Approved or Approved with conditions produces an ADR. The review is the evidence-gathering process. The ADR is the record.

This is not double work. The checklist items that were explicitly accepted as risks go into the ADR's Consequences → Risks section. The alternatives that were dismissed during the checklist walk go into the ADR's Alternatives Considered table. The conditions from an Approved with conditions outcome go into the ADR's Consequences → Negative section until they are resolved.

The ADR that comes out of a structured review is better than an ADR written in isolation — because the review forced the alternatives to be named out loud and the risks to be acknowledged explicitly. It is harder to write a vague ADR after a structured review than before one.

Review Flow

The loop back from Needs Rework to Context is intentional. The follow-up review is a full session — not a five-minute check-in. The design may have changed enough that the checklist will catch different things the second time.

Going Deeper

The checklist above gives you the structure for the room. The harder work — building a culture where engineers bring designs to review before they are half-built, and where reviewers give honest outputs without feeling like they are judging their colleagues — is an organizational problem, not a process problem.

For that layer, the architecture review meetings that actually finish in 30 minutes post covers the facilitation patterns, the organizational dynamics, and the failure modes that show up when teams first try to impose structure on what used to be informal design conversations.