AI & Developer Productivity

Prompt Engineering for Architects: Beyond 'Write Me a Function'

Most developers ask AI to write functions. Architects use AI for systems thinking: trade-off analysis, pattern matching, constraint solving. Learn 5 advanced prompt patterns with real examples: architecture context injection, trade-off analysis, pattern matching, constraint-based generation, and refactoring roadmaps. Includes 10 prompts used weekly and template library.

Ruchit Suthar
Ruchit Suthar
December 11, 20258 min read
Prompt Engineering for Architects: Beyond 'Write Me a Function'

TL;DR

Architects waste AI by asking it to write code. High-value use: architecture context injection, trade-off analysis, pattern matching, constraint-based generation, and refactoring roadmaps. These 5 advanced prompt patterns save 15-20 hours per week on architecture decisions. Includes 10 prompts used weekly by senior architects for systems thinking, not code generation.

Prompt Engineering for Architects: Beyond "Write Me a Function"

Most developers waste AI by asking it to write functions. Architects waste it by not using it at all.

I've spent 14 months experimenting with AI as an architecture tool. Not for code generation. For systems thinking. Architecture evaluation. Trade-off analysis. Pattern matching across decades of experience I don't have time to manually recall.

Here's the reality: "Write me a function" is the lowest-value use of AI for an architect. The high-value prompts aren't about code. They're about architecture context, constraint analysis, and trade-off evaluation.

This isn't theory. These are 10 prompts I use weekly. They save me 15-20 hours per week and improve architecture decisions measurably.

Why Most Architecture Prompts Fail

Before we get to what works, here's why 90% of architecture prompts produce garbage:

Bad Prompt: "Design a microservices architecture for an e-commerce platform"

AI Response: Generic blog post content. Mentions API Gateway, service mesh, event-driven patterns. Zero specificity. Useless.

The Problem: No context. No constraints. No trade-offs to evaluate.

AI excels at pattern matching and trade-off analysis. It fails at making decisions without context. Give it context, constraints, and specific questions—it becomes an exceptional architecture thinking partner.

The 5 Advanced Patterns

Pattern 1: Architecture Context Injection

The most powerful technique I've discovered: Give AI your entire architecture context upfront. Not just the problem. The constraints, trade-offs, team capabilities, and past decisions.

The Template:

# System Context
- Domain: [Your domain]
- Scale: [Users, requests, data volume]
- Team: [Size, experience level, tech stack familiarity]
- Current Architecture: [High-level description]

# Constraints
- Technical: [Language, cloud provider, existing systems]
- Business: [Budget, timeline, regulatory]
- Team: [Skill gaps, hiring constraints]

# Question
[Your specific architecture question]

# Evaluation Criteria
[How you'll measure success]

Real Example:

Last month I was evaluating whether to split a monolith or optimize it. Here's the prompt:

# System Context
- Domain: Healthcare billing platform
- Scale: 50K transactions/day, 2TB database, 15K concurrent users peak
- Team: 12 engineers (8 backend, 4 frontend), mostly Java/Spring, minimal DevOps experience
- Current Architecture: Java monolith (Spring Boot), PostgreSQL, Redis cache, deployed on AWS ECS

# Constraints
- Technical: Must support HIPAA compliance, can't change database (legacy constraints), limited to AWS
- Business: $150K budget for infrastructure changes, 6-month timeline, zero downtime requirement
- Team: No Kubernetes experience, minimal microservices experience, can't hire senior DevOps

# Question
Should we:
A) Split into microservices (which services? what boundaries?)
B) Optimize monolith with better modularization
C) Hybrid: Extract specific services, keep core monolith

# Evaluation Criteria
- Development velocity (must not decrease)
- Operational complexity (team can manage)
- Cost (within budget)
- Risk (zero downtime)
- Time to production improvement

Analyze trade-offs for each option with specific recommendations.

AI Response (Summarized):

The response was exceptional. It recommended Option C (Hybrid) with specific services to extract:

  1. Extract Billing Engine (async, high CPU, independent scaling needs)
  2. Extract Notification Service (different failure tolerance, high I/O)
  3. Keep core monolith (complex transaction boundaries, shared domain logic)

Why it recommended this:

  • Team experience: Monolith → Hybrid is lower risk than full microservices
  • Budget: 2 services manageable within $150K (gave cost breakdown)
  • Timeline: 6 months realistic for 2 extractions, not full split
  • Zero downtime: Strangler pattern possible with these boundaries

The kicker: It identified that the billing engine had different scaling characteristics (CPU-bound batch processing) that would benefit from separate infrastructure—something I knew but hadn't explicitly stated. Pattern matching across thousands of similar architectures.

Outcome: We implemented Option C. Six months later:

  • Development velocity: +15% (fewer merge conflicts in extracted services)
  • Operational complexity: Manageable (ECS, not Kubernetes)
  • Cost: $135K spent (under budget)
  • Downtime: Zero
  • Production deployments: From 2/week to 8/week

Time Saved: This level of analysis would take me 8-10 hours of research, documentation review, and trade-off modeling. AI did it in 4 minutes. I spent 2 hours validating and refining.

Pattern 2: Trade-Off Analysis

The second most valuable pattern: Asking AI to explicitly analyze trade-offs across multiple dimensions.

The Template:

Analyze trade-offs for [architectural decision] across these dimensions:

1. Performance: [Specific metrics that matter]
2. Scalability: [Growth projections]
3. Maintainability: [Team skills, codebase size]
4. Cost: [Infrastructure, development time]
5. Risk: [Failure modes, blast radius]

For each option, provide:
- Quantitative impact where possible
- Failure scenarios
- Hidden costs
- Reversibility

Current context: [Your situation]
Options: [List 2-4 options]

Real Example: Database Selection

Analyze trade-offs for database selection for a real-time analytics platform:

Context:
- 500M events/day write load
- 10-50 concurrent dashboard queries (complex aggregations)
- 90-day data retention
- Current: PostgreSQL with TimescaleDB (struggling at scale)

Options:
1. ClickHouse
2. Apache Druid
3. Elasticsearch
4. Stay with TimescaleDB + optimization

Dimensions:
1. Write throughput: Need 6K writes/sec sustained
2. Query performance: P95 < 2 seconds for dashboard loads
3. Operational complexity: 2-person DevOps team
4. Cost: Current infrastructure $8K/month, can go to $15K/month
5. Risk: 30-day migration window, can't lose data

For each option, be specific about:
- Infrastructure requirements
- Team learning curve
- Hidden costs (licensing, support, development time)
- What could go wrong

AI Response Highlights:

Option 1: ClickHouse

  • Write throughput: Excellent (proven at 100K+ writes/sec)
  • Query performance: Excellent for OLAP (P95 < 500ms for most queries)
  • Operational complexity: Medium (simpler than Druid, more complex than managed PostgreSQL)
  • Cost: ~$12K/month (3-node cluster on AWS)
  • Risk: Migration concern—ClickHouse SQL dialect differs from PostgreSQL, expect 40-60 hours rewriting queries
  • Hidden cost: No managed service on AWS (must use Altinity or self-manage)

Option 2: Apache Druid

  • Write throughput: Excellent (real-time ingestion designed for this)
  • Query performance: Good (P95 ~1-2 seconds, depends on data model)
  • Operational complexity: High (ZooKeeper, multiple node types, complex tuning)
  • Cost: ~$18K/month (exceeds budget)
  • Risk: 2-person team will struggle; typical Druid deployments need dedicated expertise
  • Hidden cost: Imply.io managed service $2K/month, or 6 months learning curve

Option 3: Elasticsearch

  • Write throughput: Good (can handle load with proper sharding)
  • Query performance: Medium (not optimized for aggregations at this scale)
  • Cost: $14K/month (managed AWS Elasticsearch)
  • Risk: Query performance may not meet P95 < 2s requirement under load
  • Hidden cost: Aggregations are resource-intensive, may need larger instance types

Option 4: TimescaleDB + Optimization

  • Write throughput: Can reach with proper batching and compression
  • Query performance: Will improve but likely can't reach P95 < 2s consistently
  • Operational complexity: Low (team knows it)
  • Cost: $10K/month (optimize existing)
  • Risk: Diminishing returns—may hit limits in 6-12 months

Recommendation: ClickHouse, but with 2-week prototype to validate query rewrite effort.

The Analysis Included:

  • Specific cost breakdowns per option
  • Infrastructure requirements (instance types, replication)
  • Team learning curve estimates (hours)
  • Failure scenarios (what happens when nodes fail)
  • Reversibility (can we go back if it fails?)

Outcome: We chose ClickHouse. The 40-60 hour query rewrite estimate was accurate (we spent 52 hours). P95 query performance is now 380ms. Write throughput hasn't broken a sweat. Cost is $11.5K/month.

Time Saved: This trade-off analysis normally takes me 12-15 hours: researching each database, reading case studies, modeling costs, evaluating risks. AI did 80% in 5 minutes. I spent 3 hours validating and building the prototype plan.

Pattern 3: Pattern Matching Across Domains

AI has seen thousands of architectures. Use it to pattern-match your problem against similar solved problems.

The Template:

I'm facing [architectural challenge].

My situation:
[Describe your context]

Find 3-5 similar architectural challenges from different domains and describe:
1. The domain and problem
2. How they solved it
3. What patterns they used
4. What I can apply to my situation
5. What won't translate (and why)

Focus on production systems at scale, not theoretical architectures.

Real Example: State Synchronization

I'm facing a state synchronization challenge.

My situation:
- Mobile app (iOS/Android) with offline-first architecture
- Users edit documents offline (text, images, annotations)
- Multiple devices per user
- Need to sync when online without losing edits
- Current: Last-write-wins (users lose data)

Find similar challenges and solutions from different domains.

AI Response (Summarized):

1. Figma's Multiplayer Editing

  • Domain: Collaborative design tool
  • Problem: Multiple users editing same canvas, real-time + offline
  • Solution: Operational Transformation (OT) for text, CRDTs for canvas objects
  • Pattern: Per-object conflict resolution, not document-level
  • Applicable: CRDT approach for document structure (headings, paragraphs)
  • Won't translate: Real-time requirements (we're offline-first, not real-time)

2. Apple Notes' Sync

  • Domain: Note-taking app, offline-first
  • Problem: Same notes edited on multiple devices while offline
  • Solution: Conflict-free Replicated Data Types (CRDTs), specifically Yjs-style approach
  • Pattern: Each device maintains full document history, merges on sync
  • Applicable: CRDT for text editing, tombstones for deletions
  • Won't translate: Apple controls the full stack (client + server), we integrate with third-party backends

3. CouchDB's Replication

  • Domain: Distributed database
  • Problem: Multi-master replication with offline nodes
  • Solution: Revision trees, deterministic conflict resolution
  • Pattern: Keep multiple document versions, application-level conflict resolution
  • Applicable: Revision tree concept for tracking document lineage
  • Won't translate: Database-level, we need application-level with specific UI for conflicts

4. Git's Merge Strategy

  • Domain: Source control
  • Problem: Distributed editing, offline work, merge conflicts
  • Solution: Content-based three-way merge, explicit conflict markers
  • Pattern: Base-mine-theirs comparison, user resolves conflicts
  • Applicable: Three-way merge for text content, visual diff for users
  • Won't translate: Git assumes technical users who understand diffs

5. Google Docs' Conflict Resolution

  • Domain: Collaborative docs
  • Problem: Multiple users, but online-required (not offline)
  • Solution: Operational Transformation with server as source of truth
  • Applicable: Intention preservation concept
  • Won't translate: Requires online connection, we need offline-first

Synthesis for Your Use Case:

  • Use CRDTs for text content (Yjs library for JavaScript)
  • Use three-way merge for non-text (images, annotations)
  • Implement revision trees to track document lineage
  • Build visual conflict resolution UI (show base-mine-theirs)
  • Use last-write-wins only for metadata (user preferences, settings)

Outcome: We implemented a hybrid approach:

  • Yjs CRDT for text editing (worked beautifully)
  • Three-way merge for annotations (users see conflicts, choose resolution)
  • Last-write-wins for images (acceptable for our use case)
  • Conflict rate dropped from 12% of syncs to 0.3%
  • User data loss: Zero in 8 months

Time Saved: Researching these patterns across domains would take 20+ hours. I'd need to read Figma's engineering blog, Apple's WWDC sessions, CouchDB docs, Git internals. AI synthesized it in 6 minutes. I spent 4 hours validating and prototyping.

Key Insight: The pattern matching revealed Yjs (a JavaScript CRDT library) which I didn't know existed. AI's cross-domain knowledge is exceptional.

Pattern 4: Constraint-Based Architecture

Give AI your constraints and have it design within them. Not "design the perfect architecture" but "design the best architecture given these constraints."

The Template:

Design [system/component] within these hard constraints:

# Non-Negotiable Constraints
- [Constraint 1]
- [Constraint 2]
- [Constraint 3]

# Soft Constraints (preferably met, but negotiable)
- [Constraint 4]
- [Constraint 5]

# Anti-Requirements (explicitly don't do this)
- [Anti-requirement 1]
- [Anti-requirement 2]

Requirements: [What the system must do]

Provide:
1. Architecture that respects all hard constraints
2. Trade-offs you made for soft constraints
3. Why each anti-requirement is avoided
4. What you'd change if constraints were relaxed

Real Example: API Gateway

Design an API gateway for a B2B SaaS platform within these constraints:

# Non-Negotiable Constraints
- Must run on-premises (customer requirement, non-negotiable)
- Must support SAML 2.0 (customer's enterprise SSO)
- Must handle 5K requests/sec peak
- Must deploy to customer's existing Kubernetes cluster (1.23+)
- Zero cloud dependencies (no AWS/GCP/Azure services)

# Soft Constraints
- Prefer open-source components (licensing concerns)
- Minimize memory footprint (customers have limited resources)
- Avoid JVM-based solutions if possible (customer has JVM allergy)

# Anti-Requirements
- No API management platforms (Kong, Apigee) - customer won't pay licensing
- No service mesh (Istio, Linkerd) - too complex for customer's team
- No custom-built gateway - we don't have time to build/maintain

Requirements:
- Authentication and authorization
- Rate limiting per customer
- Request/response transformation
- API versioning
- Observability (metrics, traces)

Provide an architecture that works within these constraints.

AI Response (Summarized):

Recommended Architecture: Envoy Proxy + Custom Control Plane

Core Components:

  1. Envoy Proxy (data plane)

    • Written in C++, not JVM
    • Proven at 5K+ req/sec
    • Runs in Kubernetes as sidecar or standalone
    • Memory footprint: ~50-100MB per instance
  2. Custom Control Plane (Go)

    • Generates Envoy config for SAML, rate limiting, routing
    • Stores config in customer's existing PostgreSQL
    • ~30MB memory footprint
    • Open-source Go (MIT license)
  3. SAML Integration

    • Envoy ext_authz filter → custom auth service (Go)
    • Auth service handles SAML 2.0 flow
    • Session stored in Redis (customer already has it)

Why This Works:

  • ✅ On-premises: No cloud dependencies
  • ✅ SAML 2.0: Custom auth service handles it
  • ✅ 5K req/sec: Envoy proven at much higher scale
  • ✅ Kubernetes: Envoy designed for K8s
  • ✅ Zero cloud: All components self-hosted
  • ✅ Open-source: Envoy (Apache 2.0), control plane (we write, MIT)
  • ✅ Memory: ~150MB total (Envoy + control plane + auth service)
  • ✅ Not JVM: C++ and Go

Why Not Other Options:

  • Kong: JVM-based (Cassandra), licensing costs
  • Envoy + Istio: Too complex (service mesh overhead)
  • Custom gateway: 6+ months development time
  • NGINX: Can't handle SAML without commercial Plus version
  • Traefik: Limited SAML support, less proven at scale

Trade-Offs Made:

  • Built custom control plane (3-4 weeks dev time) to avoid commercial solutions
  • SAML integration requires custom code (no off-the-shelf that meets constraints)
  • Observability requires Prometheus/Grafana (customer must provide)

If Constraints Relaxed:

  • If cloud was allowed: AWS API Gateway (managed, simpler)
  • If JVM okay: Kong + Cassandra (richer features)
  • If service mesh okay: Istio (more features, higher complexity)

Outcome: We implemented this architecture. It's been running for 11 months across 8 customer deployments:

  • Performance: 4.2K req/sec average, 7.8K peak (exceeded requirement)
  • Memory: 140MB average (within constraint)
  • Customer satisfaction: High (meets their security/compliance needs)
  • Development time: 5 weeks (3 weeks control plane, 2 weeks SAML)

Time Saved: Evaluating all gateway options against these constraints would take 15-20 hours. AI did it in 4 minutes. I spent 6 hours validating Envoy's SAML capabilities and prototyping.

Pattern 5: Refactoring Roadmaps

The most underutilized pattern: Ask AI to generate multi-phase refactoring roadmaps with risk mitigation.

The Template:

Create a refactoring roadmap for [system/component].

# Current State
[Describe what exists now]

# Desired State
[Describe end goal]

# Constraints
- Timeline: [Your timeline]
- Team: [Size and availability]
- Risk tolerance: [High/Medium/Low]
- Deployment frequency: [How often can you deploy]

Requirements:
1. Break into 2-week increments
2. Each increment must be independently deployable
3. Include rollback strategy for each phase
4. Identify risks and mitigation
5. Define success metrics per phase

Provide a week-by-week roadmap.

Real Example: Legacy Payment System

Create a refactoring roadmap for our payment processing system.

# Current State
- PHP 5.6 monolith processing 50K payments/day
- Direct database writes (no events, no audit log)
- No idempotency (retries can double-charge)
- Tightly coupled to legacy ERP system
- Payment provider calls synchronous (blocks user)
- No automated testing (manual QA only)
- Deployments: Monthly, with 2-hour downtime

# Desired State
- Modern stack (Go or Node.js)
- Event-sourced (full audit trail)
- Idempotent operations
- Asynchronous payment processing
- Decoupled from ERP (events for integration)
- 95%+ test coverage
- Zero-downtime deployments

# Constraints
- Timeline: 6 months
- Team: 4 engineers (2 senior, 2 mid-level)
- Risk tolerance: Low (payment system, can't break)
- Deployment frequency: Weekly (after month 2)
- Business constraint: Zero payment downtime, zero data loss

Create a roadmap that doesn't break payments.

AI Response (Summarized - First 8 Weeks):

Phase 1: Weeks 1-2 - Observability & Testing Foundation

  • Add structured logging to existing PHP system
  • Implement payment event logging (append-only table)
  • Create test harness for payment flows
  • Deployable: Yes (logging is additive)
  • Rollback: Disable logging if issues
  • Risk: Low (read-only changes)
  • Success Metric: 100% of payments logged with full context

Phase 2: Weeks 3-4 - Idempotency Layer

  • Add idempotency keys to payment API
  • Store idempotency keys in Redis (6-hour TTL)
  • Modify payment endpoints to check keys
  • Deployable: Yes (new endpoints, old ones unchanged)
  • Rollback: Route traffic to old endpoints
  • Risk: Medium (requires Redis)
  • Mitigation: Redis failure → fall back to processing (log warning)
  • Success Metric: Zero duplicate charges from retries

Phase 3: Weeks 5-6 - Async Processing (Step 1)

  • Build payment job queue (RabbitMQ)
  • Create worker service (PHP, same logic as monolith)
  • Route 10% of traffic to async flow (A/B test)
  • Deployable: Yes (parallel system)
  • Rollback: Route 100% to synchronous flow
  • Risk: Medium (new infrastructure)
  • Mitigation: Monitor queue depth, failure rates; rollback if queue backs up
  • Success Metric: 10% of payments async, same success rate as sync

Phase 4: Weeks 7-8 - Async Processing (Step 2)

  • Increase async traffic to 50%
  • Add retry logic with exponential backoff
  • Implement dead-letter queue for failures
  • Deployable: Yes (configuration change)
  • Rollback: Reduce to 10% if issues
  • Risk: Medium-High (majority of traffic)
  • Mitigation: Real-time monitoring, automatic rollback if error rate > 0.5%
  • Success Metric: 50% async, P95 processing time < 5 seconds

[Continues for 24 weeks total...]

The roadmap continued through:

  • Weeks 9-12: Strangler pattern migration to Go service
  • Weeks 13-16: Event sourcing implementation
  • Weeks 17-20: ERP decoupling via events
  • Weeks 21-24: Deprecate PHP monolith, 100% on new system

Each Phase Included:

  • Specific technical tasks
  • Deployment strategy
  • Rollback plan
  • Risk assessment
  • Mitigation strategy
  • Success metrics

Outcome: We followed this roadmap with minor adjustments:

  • Completed in 26 weeks (2 weeks over due to holiday delays)
  • Zero payment downtime
  • Zero data loss
  • Zero duplicate charges (idempotency worked perfectly)
  • Payment processing time: 350ms → 45ms (P95)
  • Deployment frequency: Monthly → 2x/week

Time Saved: Creating a detailed, phase-by-phase refactoring roadmap normally takes me 12-16 hours. AI did 70% in 8 minutes. I spent 4 hours refining rollback strategies and adding team-specific context.

Key Insight: The AI roadmap identified the idempotency layer as phase 2 (weeks 3-4), which I would have delayed until later. Moving it early prevented production issues during the migration. This alone saved us 2 weeks of firefighting.

The 10 Prompts I Use Weekly

Here are the exact prompts I use most often, refined over 14 months:

1. Architecture Review Prompt

Review this architecture decision:

# Decision
[Your proposed architecture]

# Context
- Domain: [Your domain]
- Scale: [Current and projected]
- Team: [Size, skills]

# Constraints
[Your constraints]

Evaluate:
1. What could go wrong? (failure modes)
2. What will be hard to change later? (irreversible decisions)
3. What am I not considering? (blind spots)
4. What's the cheapest way to validate this? (experiments)

Be specific. Use examples from similar systems.

Use Case: Before committing to major architecture decisions. Time Saved: 2-3 hours per decision.

2. Incident Architecture Analysis

We had a production incident:

# What Happened
[Incident description]

# Root Cause
[Technical root cause]

# Current Architecture
[Relevant architecture]

Questions:
1. What architectural changes would prevent this?
2. What are the trade-offs of each change?
3. What's the minimum viable fix?
4. What should we do long-term?

Prioritize by effort vs. impact.

Use Case: Post-incident architecture improvements. Time Saved: 3-4 hours per incident.

3. Technology Evaluation

Evaluate [Technology X] for [Use Case Y]:

# Our Context
- Current stack: [Your stack]
- Team experience: [Skills]
- Scale: [Your scale]

# Requirements
[What you need]

# Concerns
[What worries you]

Provide:
1. Where it excels vs. where it struggles
2. Hidden costs (learning curve, ops burden, licensing)
3. Alternatives worth considering
4. Red flags (deal-breakers)
5. Validation experiment (cheapest way to test)

Use real production examples.

Use Case: Evaluating new technologies or frameworks. Time Saved: 4-6 hours per technology.

4. Migration Strategy

Plan migration from [System A] to [System B]:

# System A (Current)
[Description]

# System B (Target)
[Description]

# Constraints
- Data volume: [Amount]
- Downtime allowed: [Time]
- Team: [Size]
- Timeline: [Duration]

Provide:
1. Step-by-step migration plan
2. Rollback strategy for each step
3. Data validation approach
4. Risk mitigation
5. What to monitor during migration

Break into independently deployable phases.

Use Case: Planning system migrations. Time Saved: 6-8 hours per migration.

5. Performance Optimization

Our [component] has performance issues:

# Current Performance
- Metric: [Current value]
- Target: [Desired value]
- Scale: [Traffic/data volume]

# Current Architecture
[Relevant architecture]

# Constraints
- Can't change: [What's fixed]
- Budget: [Time/money]

Provide:
1. Top 3 optimization opportunities (biggest impact)
2. Expected improvement per optimization
3. Effort estimate (hours/days)
4. Risks and mitigations
5. How to measure success

Prioritize quick wins vs. long-term solutions.

Use Case: Performance optimization planning. Time Saved: 3-5 hours per optimization effort.

6. API Design Review

Review this API design:

[Your API spec or description]

# Context
- Clients: [Who uses it]
- Scale: [Request volume]
- Versioning strategy: [Your approach]

Evaluate:
1. What's inconsistent or surprising?
2. What will be hard to evolve?
3. What's missing for production use?
4. What are better alternatives for [specific endpoint]?

Use REST/GraphQL best practices. Be opinionated.

Use Case: API design reviews. Time Saved: 1-2 hours per API.

7. System Complexity Analysis

Analyze complexity of [system/component]:

# Current Architecture
[Description]

Questions:
1. What's unnecessarily complex? (over-engineering)
2. What's deceptively simple? (hidden complexity)
3. What will become a problem at 10x scale?
4. How would you simplify this?

Provide specific refactoring recommendations.

Use Case: Identifying complexity issues. Time Saved: 2-3 hours per analysis.

8. Failure Mode Analysis

Analyze failure modes for [system]:

# Architecture
[Your architecture]

# SLAs
- Availability: [Target]
- Latency: [Target]
- Data loss: [Tolerance]

For each component, identify:
1. How it can fail
2. Blast radius (what breaks)
3. Detection time
4. Recovery time
5. Mitigation strategy

Prioritize by likelihood × impact.

Use Case: Resilience planning. Time Saved: 4-5 hours per system.

9. Technical Debt Assessment

Assess technical debt in [system/component]:

# Current State
[Description]

# Pain Points
[What's causing problems]

Evaluate:
1. What debt is slowing us down most? (quantify impact)
2. What debt is low-effort to fix? (quick wins)
3. What debt can we live with? (acceptable debt)
4. What debt is getting worse? (urgent)

Provide a prioritized backlog with effort estimates.

Use Case: Technical debt prioritization. Time Saved: 3-4 hours per assessment.

10. Architecture Documentation

Generate architecture documentation for [system]:

# System Description
[Your description]

# Architecture
[Components and interactions]

Create:
1. System overview (2-3 paragraphs)
2. Component diagram (text description)
3. Data flow (request/response lifecycle)
4. Key decisions and trade-offs
5. Operational concerns (deployment, monitoring, failure modes)

Use C4 model concepts. Target audience: new engineers.

Use Case: Architecture documentation. Time Saved: 3-5 hours per system.

Implementation Guide

Week 1: Start with Pattern 1 (Context Injection)

Pick your next architecture decision. Use the architecture context injection template. Compare AI recommendation to your intuition.

Success Criteria: AI identifies at least one consideration you hadn't thought of.

Week 2: Add Pattern 2 (Trade-Off Analysis)

Use the trade-off analysis template for a technology decision. Create a spreadsheet comparing AI analysis to your research.

Success Criteria: AI analysis saves you at least 4 hours of research.

Week 3: Experiment with Pattern 3 (Pattern Matching)

Take a challenging problem. Ask AI to find similar problems in different domains. Validate the patterns it suggests.

Success Criteria: Discover at least one pattern you can apply.

Week 4: Build Your Prompt Library

Take the 10 prompts above. Customize them for your domain, constraints, and team. Save them in a shared document.

Success Criteria: Team members start using them.

Month 2: Establish Architecture AI Practices

  • Use AI for all architecture reviews
  • Document decisions made with AI assistance
  • Track time saved
  • Refine prompts based on what works

Success Metrics

Track these metrics to measure impact:

Time Savings:

  • Hours spent on architecture research (before/after)
  • Time to make architecture decisions
  • Documentation creation time

Decision Quality:

  • Number of architectural changes due to issues found by AI
  • Incidents caused by architecture decisions
  • Architecture decisions that needed reversal

Team Impact:

  • Architecture review cycle time
  • Number of architecture perspectives considered
  • Team satisfaction with architecture process

Common Mistakes

Mistake 1: Using AI Without Context

Bad: "Design a caching layer" Good: [Full context with constraints, scale, team, requirements]

Context is everything. AI without context produces generic advice.

Mistake 2: Accepting First Response

AI's first response is rarely optimal. Ask follow-up questions:

  • "What are the risks with this approach?"
  • "What alternatives did you not mention?"
  • "What could go wrong?"

Iterate. Challenge. Refine.

Mistake 3: Not Validating Recommendations

AI makes mistakes. Always validate:

  • Check claimed performance numbers
  • Verify compatibility claims
  • Test recommended approaches in small experiments
  • Consult official documentation

Use AI as a thinking partner, not a source of truth.

Mistake 4: Ignoring Team Constraints

AI might recommend Kubernetes when your team has zero K8s experience. Always include team skills in your context.

Add to every prompt:

Team constraints:
- Experience level: [Junior/Mid/Senior mix]
- Skills: [Known technologies]
- Skill gaps: [What they don't know]
- Learning capacity: [Time available for learning]

Mistake 5: Using AI for Implementation Details

AI is exceptional for architecture thinking. It's mediocre for implementation details.

Good AI Use:

  • "Should we use event sourcing for this use case?"
  • "What are trade-offs of microservices vs. modular monolith?"
  • "How should we handle distributed transactions?"

Poor AI Use:

  • "Write the event sourcing implementation"
  • "Generate microservice boilerplate"
  • "Implement saga pattern"

Use AI for architecture decisions. Write implementation yourself.

The Bottom Line

Prompt engineering for architects isn't about generating code. It's about:

  1. Architecture Context Injection: Give AI full context, get specific recommendations
  2. Trade-Off Analysis: Evaluate options across multiple dimensions with quantitative impact
  3. Pattern Matching: Find similar problems solved in different domains
  4. Constraint-Based Design: Design within real-world constraints, not ideal conditions
  5. Refactoring Roadmaps: Generate phased migration plans with risk mitigation

The 10 prompts I shared save me 15-20 hours per week. They improve architecture decisions by surfacing blind spots, identifying risks, and providing cross-domain patterns I wouldn't have found manually.

Start with one pattern next week. Master it. Add another. Build a prompt library for your team. Share what works.

AI won't replace architects. But architects who use AI will replace architects who don't.

The difference is knowing how to prompt for systems thinking, not code generation.

Topics

prompt-engineeringsoftware-architectureai-promptingsystem-designtechnical-leadershiparchitecture-patternstrade-off-analysis
Ruchit Suthar

About Ruchit Suthar

15+ years scaling teams from startup to enterprise. 1,000+ technical interviews, 25+ engineers led. Real patterns, zero theory.