The AI Refactoring Playbook: When to Trust the Bot, When to Override

I've reviewed 200+ AI-suggested refactorings over the past year. Here's what I learned: 60% improved the code, 30% were neutral shuffling, and 10% would have caused production issues if merged blindly.

The problem isn't AI capability—it's knowing when to trust it. Most engineers fall into two camps: blind trust (risky) or blanket skepticism (waste of opportunity). Both are wrong.

What you need is a framework. This is mine.

The Trust Decision Framework

Not all refactorings carry equal risk. Extract a method? Low risk. Refactor distributed transactions? High risk. You need a mental model to categorize quickly.

The 4-Quadrant Matrix

Think about every AI refactoring along two dimensions:

Complexity: How much does this change?

Low: Single function, obvious transformation
High: Multiple files, changes system behavior, affects multiple callers

Risk: What happens if it breaks?

Low: Caught immediately by tests, easy to revert, limited blast radius
High: Could cause data corruption, silent failures, affects many users

This creates four quadrants:

Green Quadrant (Low Complexity + Low Risk)

Trust AI 90%+ of the time
Quick review, merge confidently
Examples: Extract method, rename variables, add type annotations

Yellow Quadrant (Mixed)

Verify closely before merging
Check edge cases AI might miss
Examples: Database migrations, async conversions, error handling changes

Red Quadrant (High Complexity + High Risk)

Default to manual review
Use AI for ideas, not implementation
Examples: Cross-service transactions, security-critical code, performance hotspots

The "Blast Radius" Question

Quick way to assess: "If this breaks, how many users/services/systems are affected?"

1 service, caught by tests: Green
Multiple services, might not be caught immediately: Yellow
Production data or money involved: Red

The "Rollback Cost" Question

"Can I easily revert if something goes wrong?"

Revert one commit, redeploy one service: Green
Coordinate rollback across services: Yellow
Data migration that's hard to reverse: Red

Use these questions in every AI refactoring review. 5 seconds of categorization prevents hours of debugging.

Green Light Scenarios: Trust AI 90%+

These refactorings are low-risk, high-value. AI handles them well. Your job: quick sanity check, then merge.

Scenario 1: Extract Method Refactoring

Why AI Excels

AI is exceptional at spotting code duplication. It scans your entire file in seconds, identifies repeated patterns, and suggests extractions. Humans do this too, but slower and less consistently.

Real Example

Before:

class UserService {
  async registerUser(data: RegisterDto) {
    // Validation
    if (!data.email || !data.email.includes('@')) {
      throw new Error('Invalid email');
    }
    if (!data.password || data.password.length < 8) {
      throw new Error('Password must be at least 8 characters');
    }
    if (!data.username || data.username.length < 3) {
      throw new Error('Username must be at least 3 characters');
    }
    
    return await this.userRepo.create(data);
  }
  
  async updateUser(id: string, data: UpdateDto) {
    // Same validation duplicated
    if (!data.email || !data.email.includes('@')) {
      throw new Error('Invalid email');
    }
    if (data.password && data.password.length < 8) {
      throw new Error('Password must be at least 8 characters');
    }
    if (!data.username || data.username.length < 3) {
      throw new Error('Username must be at least 3 characters');
    }
    
    return await this.userRepo.update(id, data);
  }
}

AI Suggestion:

class UserService {
  async registerUser(data: RegisterDto) {
    this.validateUserData(data);
    return await this.userRepo.create(data);
  }
  
  async updateUser(id: string, data: UpdateDto) {
    this.validateUserData(data);
    return await this.userRepo.update(id, data);
  }
  
  private validateUserData(data: Partial<RegisterDto>) {
    if (data.email && !data.email.includes('@')) {
      throw new Error('Invalid email');
    }
    if (data.password && data.password.length < 8) {
      throw new Error('Password must be at least 8 characters');
    }
    if (data.username && data.username.length < 3) {
      throw new Error('Username must be at least 3 characters');
    }
  }
}

What to Check: ✅ Method name makes sense (validateUserData is clear) ✅ Parameters are logical (accepts Partial<RegisterDto> for flexibility) ✅ All callers work correctly (both registerUser and updateUser) ✅ Edge cases handled (made validation checks conditional for updates)

Red Flags: 🚩 Method name is vague (processData, handleStuff) 🚩 Extracted method does too many unrelated things 🚩 Logic changed subtly during extraction

Time Investment: 2 minutes review → Merge

Scenario 2: Rename Variables for Clarity

Why AI Excels

AI understands context from surrounding code. It sees a variable called data but knows from usage it represents customerOrders. Humans know this too, but naming is tedious. AI makes it instant.

Real Example

Before:

function processOrders(data: any[]) {
  const result = [];
  
  for (const item of data) {
    const x = item.price * item.quantity;
    const y = x * 0.1;
    const z = x + y;
    
    result.push({
      id: item.id,
      total: z
    });
  }
  
  return result;
}

AI Suggestion:

function processOrders(orders: Order[]) {
  const processedOrders = [];
  
  for (const order of orders) {
    const subtotal = order.price * order.quantity;
    const tax = subtotal * 0.1;
    const total = subtotal + tax;
    
    processedOrders.push({
      id: order.id,
      total: total
    });
  }
  
  return processedOrders;
}

What to Check: ✅ Names accurately describe the data ✅ Names follow team conventions (camelCase, descriptive) ✅ No breaking changes (public API parameters unchanged or deprecated properly)

Red Flags: 🚩 AI renames public API parameters (breaking change for consumers) 🚩 Names don't match team conventions (team uses order_id, AI suggests orderId) 🚩 Names are too verbose (customerOrderTotalIncludingTaxAndShipping vs. total)

Time Investment: 1 minute review → Merge

Scenario 3: Type Annotation Additions

Why AI Excels

AI infers types from usage. It sees user.email and knows user has an email property. It sees your function returns a number and adds number return type. Perfect for gradually typing legacy JavaScript or improving TypeScript strictness.

Real Example

Before (JavaScript/Loose TypeScript):

function calculateDiscount(user, cart) {
  const items = cart.items;
  let total = 0;
  
  for (const item of items) {
    total += item.price * item.quantity;
  }
  
  if (user.isPremium) {
    return total * 0.2;
  }
  
  return total * 0.1;
}

AI Suggestion:

interface User {
  isPremium: boolean;
}

interface CartItem {
  price: number;
  quantity: number;
}

interface Cart {
  items: CartItem[];
}

function calculateDiscount(user: User, cart: Cart): number {
  const items: CartItem[] = cart.items;
  let total: number = 0;
  
  for (const item of items) {
    total += item.price * item.quantity;
  }
  
  if (user.isPremium) {
    return total * 0.2;
  }
  
  return total * 0.1;
}

What to Check: ✅ Types are accurate (not just any everywhere) ✅ Types aren't overly complex (simple is better) ✅ Interfaces reuse existing types when possible

Red Flags: 🚩 AI adds any everywhere (defeats the purpose) 🚩 Complex union types that hurt readability: string | number | boolean | null | undefined | CustomType 🚩 Doesn't reuse existing type definitions (creates duplicate User interface when one exists)

Time Investment: 2 minutes review → Merge

Scenario 4: Simple Pattern Migrations

Why AI Excels

Mechanical transformations: callbacks → async/await, var → const/let, deprecated API → new API. AI follows the pattern exactly. Less error-prone than humans doing repetitive changes.

Real Example: Callbacks to Async/Await

Before:

function loadUserData(userId: string, callback: (err: Error | null, user?: User) => void) {
  db.query('SELECT * FROM users WHERE id = ?', [userId], (err, results) => {
    if (err) {
      callback(err);
      return;
    }
    
    const user = results[0];
    
    db.query('SELECT * FROM orders WHERE user_id = ?', [userId], (err, orders) => {
      if (err) {
        callback(err);
        return;
      }
      
      user.orders = orders;
      callback(null, user);
    });
  });
}

AI Suggestion:

async function loadUserData(userId: string): Promise<User> {
  const results = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
  const user = results[0];
  
  const orders = await db.query('SELECT * FROM orders WHERE user_id = ?', [userId]);
  user.orders = orders;
  
  return user;
}

What to Check: ✅ Error handling still works (async functions reject on errors) ✅ All callbacks converted (no half-async code) ✅ Awaits are in the right places ✅ Edge cases handled (empty results, null checks)

Red Flags: 🚩 Missing awaits (function marked async but doesn't await promises) 🚩 Error handling removed (callback had error handling, async version doesn't) 🚩 Changed behavior subtly (callback was non-blocking, now it blocks)

Time Investment: 3 minutes review → Merge

Key Principle for Green Light Refactorings

Trust but verify. AI is 90%+ accurate on these patterns. Your job: catch the 10% where AI missed something. 2-3 minute review is all you need.

If you find yourself spending 15+ minutes reviewing a "simple" refactoring, it's not actually simple. Reclassify it to Yellow or Red.

Yellow Light Scenarios: Verify Closely

These refactorings look good at first glance but have hidden complexity. AI generates plausible code, but you must check deeply.

Scenario 1: Database Schema Changes

Why Caution Needed

Databases have state. Changing schema affects existing data. AI generates the "after" state but often misses migration complexity, data loss risks, and backward compatibility.

Real Example: Adding a Column

Before:

CREATE TABLE orders (
  id INT PRIMARY KEY,
  user_id INT NOT NULL,
  total DECIMAL(10,2) NOT NULL,
  created_at TIMESTAMP NOT NULL
);

AI Suggestion:

-- Add status column
ALTER TABLE orders ADD COLUMN status VARCHAR(50) NOT NULL DEFAULT 'pending';

-- Update application code
class Order {
  id: number;
  userId: number;
  total: number;
  status: string; // NEW
  createdAt: Date;
}

Looks Good, But Check These:

1. Default Value Appropriateness

-- AI suggests: DEFAULT 'pending'
-- But: What about existing orders? Are they all actually pending?
-- Reality: Existing orders are probably 'completed' or 'shipped'

-- Better migration:
ALTER TABLE orders ADD COLUMN status VARCHAR(50);
UPDATE orders SET status = 'completed' WHERE created_at < NOW() - INTERVAL '7 days';
UPDATE orders SET status = 'shipped' WHERE created_at >= NOW() - INTERVAL '7 days';
ALTER TABLE orders ALTER COLUMN status SET NOT NULL;
ALTER TABLE orders ALTER COLUMN status SET DEFAULT 'pending';

2. Backward Compatibility

// AI updates the model immediately
// Problem: Old application instances still running

// Better: Make nullable first, deploy, then make NOT NULL later
ALTER TABLE orders ADD COLUMN status VARCHAR(50) NULL;

// Application handles both cases
class Order {
  status: string | null; // Handles old orders without status
}

// Later (after all instances deployed):
UPDATE orders SET status = 'completed' WHERE status IS NULL;
ALTER TABLE orders ALTER COLUMN status SET NOT NULL;

3. Index Needs

-- AI adds column but forgets index
-- If you query by status frequently:
SELECT * FROM orders WHERE status = 'pending';

-- You need an index:
CREATE INDEX idx_orders_status ON orders(status);

4. Migration Rollback

-- AI provides forward migration
-- What if you need to rollback?

-- Forward:
ALTER TABLE orders ADD COLUMN status VARCHAR(50) NOT NULL DEFAULT 'pending';

-- Rollback:
ALTER TABLE orders DROP COLUMN status;

-- But: If old code still references status column, rollback breaks!
-- Better: Feature flag + gradual rollout

Validation Checklist:

Default value makes sense for existing data
Migration script handles all existing data correctly
No data loss (test on production-sized dataset)
Backward compatible (old code can run during deployment)
Indexes added for new query patterns
Rollback plan exists and tested
Migration tested on production snapshot

Real War Story

Team added NOT NULL column with default value. Migration took 3 hours on 50M row table, locking the table. Production down for 3 hours. Better approach: Add nullable column, backfill asynchronously, make NOT NULL later.

Scenario 2: Async/Sync Boundary Modifications

Why Caution Needed

Converting sync to async (or vice versa) changes control flow. Race conditions appear. Callers break. Transaction boundaries shift. AI sees the syntax change but misses the concurrency implications.

Real Example: Sync to Async Conversion

Before:

class OrderService {
  createOrder(userId: string, items: OrderItem[]): Order {
    const user = this.userRepo.findById(userId);
    const inventory = this.inventoryRepo.checkAvailability(items);
    
    if (!inventory.allAvailable) {
      throw new Error('Items not available');
    }
    
    const order = this.orderRepo.create({ userId, items });
    this.inventoryRepo.reserve(items);
    
    return order;
  }
}

AI Suggestion:

class OrderService {
  async createOrder(userId: string, items: OrderItem[]): Promise<Order> {
    const user = await this.userRepo.findById(userId);
    const inventory = await this.inventoryRepo.checkAvailability(items);
    
    if (!inventory.allAvailable) {
      throw new Error('Items not available');
    }
    
    const order = await this.orderRepo.create({ userId, items });
    await this.inventoryRepo.reserve(items);
    
    return order;
  }
}

Looks Good, But Check These:

1. All Callers Updated

// Old caller (now BROKEN):
const order = orderService.createOrder(userId, items);
console.log(order.id); // Works in sync version

// AI might update this:
const order = await orderService.createOrder(userId, items);
console.log(order.id); // Fixed

// But AI might MISS this:
const order = orderService.createOrder(userId, items); // Returns Promise!
sendEmail(order.id); // UNDEFINED! Promise hasn't resolved

Check ALL callers in your entire codebase. One missed caller = production bug.

2. Transaction Boundaries

// Original (sync): Everything in one transaction
createOrder(userId, items) {
  // These execute in single database transaction
  const order = this.orderRepo.create({ userId, items });
  this.inventoryRepo.reserve(items);
  return order;
}

// AI version (async): Transaction might close between awaits!
async createOrder(userId, items) {
  const order = await this.orderRepo.create({ userId, items });
  // Transaction might commit here!
  await this.inventoryRepo.reserve(items);
  // Inventory reserved outside transaction - race condition!
}

// Fixed: Explicit transaction
async createOrder(userId, items) {
  return await this.db.transaction(async (trx) => {
    const order = await this.orderRepo.create({ userId, items }, trx);
    await this.inventoryRepo.reserve(items, trx);
    return order;
  });
}

3. Error Handling Changes

// Sync version:
try {
  const order = createOrder(userId, items);
  console.log('Order created:', order.id);
} catch (error) {
  console.error('Failed:', error);
}

// AI conversion might work, but check:
try {
  const order = await createOrder(userId, items);
  console.log('Order created:', order.id);
} catch (error) {
  console.error('Failed:', error);
  // Are unhandled promise rejections caught elsewhere?
}

// Watch for: async functions that don't await properly
async function handler() {
  createOrder(userId, items); // Forgot await! Errors won't be caught
}

4. Performance Implications

// Sequential (slow):
const user = await this.userRepo.findById(userId); // Wait 10ms
const inventory = await this.inventoryRepo.check(items); // Wait 10ms
// Total: 20ms

// Parallel (faster):
const [user, inventory] = await Promise.all([
  this.userRepo.findById(userId),
  this.inventoryRepo.check(items)
]);
// Total: 10ms

// AI might convert correctly but miss optimization opportunity

Validation Checklist:

All callers updated (search entire codebase)
No missing awaits
Transaction boundaries preserved
Error handling still works
No unhandled promise rejections
Consider parallelization opportunities
Test concurrent scenarios (race conditions)

Scenario 3: Error Handling Rewrites

Why Caution Needed

Error handling is business logic. How you handle errors affects user experience, debugging, and system reliability. AI consolidates try-catch blocks for "cleaner" code but might change behavior subtly.

Real Example

Before:

async function processPayment(orderId: string, amount: number) {
  let payment;
  
  try {
    payment = await stripeService.charge(amount);
  } catch (error) {
    logger.error('Stripe charge failed', { orderId, amount, error });
    await notifyAdmin('Payment processing failed', { orderId });
    throw new PaymentError('Payment failed. Please try again.');
  }
  
  try {
    await orderRepo.updatePaymentStatus(orderId, payment.id);
  } catch (error) {
    logger.error('Failed to update order', { orderId, paymentId: payment.id, error });
    // Payment succeeded but order update failed - need manual intervention
    await notifyAdmin('CRITICAL: Payment succeeded but order update failed', {
      orderId,
      paymentId: payment.id
    });
    throw new SystemError('Payment processed but order update failed');
  }
  
  return payment;
}

AI Suggestion (Consolidated):

async function processPayment(orderId: string, amount: number) {
  try {
    const payment = await stripeService.charge(amount);
    await orderRepo.updatePaymentStatus(orderId, payment.id);
    return payment;
  } catch (error) {
    logger.error('Payment processing failed', { orderId, amount, error });
    await notifyAdmin('Payment processing failed', { orderId });
    throw new PaymentError('Payment failed. Please try again.');
  }
}

What AI Lost:

1. Different Error Handling Per Stage

Original: Stripe failure vs. order update failure are handled differently.

Stripe fails: "Payment failed, try again" (user can retry)
Order update fails: "CRITICAL alert" (payment succeeded, need manual fix)

AI version: Both errors treated the same. If order update fails after payment succeeds, user gets "Payment failed, try again"—but payment actually succeeded! They'll be double-charged if they retry.

2. Context in Logging

Original: Logs include paymentId when order update fails (needed for manual reconciliation) AI version: Lost this context

3. Error Message Accuracy

Original: Different error messages based on where failure occurred AI version: Generic message doesn't reflect reality

Fixed AI Version:

async function processPayment(orderId: string, amount: number) {
  let payment;
  
  // Stage 1: Charge payment
  try {
    payment = await stripeService.charge(amount);
  } catch (error) {
    logger.error('Stripe charge failed', { orderId, amount, error });
    await notifyAdmin('Payment processing failed', { orderId });
    throw new PaymentError('Payment failed. Please try again.');
  }
  
  // Stage 2: Update order (critical - payment already succeeded)
  try {
    await orderRepo.updatePaymentStatus(orderId, payment.id);
  } catch (error) {
    logger.error('Order update failed after successful payment', {
      orderId,
      paymentId: payment.id,
      error
    });
    await notifyAdmin('CRITICAL: Payment succeeded but order update failed', {
      orderId,
      paymentId: payment.id,
      action: 'Manual reconciliation required'
    });
    throw new SystemError(
      'Payment processed successfully but order update failed. ' +
      'Please contact support. Payment ID: ' + payment.id
    );
  }
  
  return payment;
}

Validation Checklist:

Error messages still accurate for each failure scenario
Logging includes context needed for debugging
Alerts appropriate for severity (critical vs. warning)
User-facing errors reflect reality
No silent failures introduced
Retry logic preserved where needed
Test each error scenario explicitly

Real War Story

AI refactored error handling in authentication service. Consolidated catch blocks. Removed distinction between "invalid password" (user can retry) and "account locked" (user must contact support). Result: Locked users kept trying to log in, hitting rate limits, making situation worse.

Red Light Scenarios: Override Often

These are where AI suggestions are dangerous. Even if the code looks right, the complexity and risk are too high. Use AI for ideas, but humans must design the solution.

Scenario 1: Cross-Service Transaction Logic

Why Humans Must Lead

Distributed transactions require understanding business invariants, eventual consistency, compensation logic, and failure modes. AI suggests technically correct code that violates business rules or introduces race conditions.

Real Example

Original (Monolith):

async function createOrder(userId: string, items: OrderItem[]) {
  return await db.transaction(async (trx) => {
    // Everything in one ACID transaction
    const order = await trx('orders').insert({ userId, total: 0 });
    
    let total = 0;
    for (const item of items) {
      await trx('order_items').insert({ orderId: order.id, ...item });
      await trx('inventory').decrement('quantity', item.quantity).where({ id: item.productId });
      total += item.price * item.quantity;
    }
    
    await trx('orders').update({ total }).where({ id: order.id });
    await trx('payments').insert({ orderId: order.id, amount: total, status: 'pending' });
    
    return order;
  });
}

AI Suggestion (Microservices):

async function createOrder(userId: string, items: OrderItem[]) {
  // AI tries to make it "distributed transaction"
  const order = await orderService.createOrder({ userId, items });
  
  try {
    await inventoryService.reserveItems(items);
    await paymentService.initiatePayment(order.id, order.total);
    return order;
  } catch (error) {
    // Rollback?
    await orderService.deleteOrder(order.id);
    throw error;
  }
}

What AI Got Wrong:

1. No Distributed Transaction Guarantees

If payment fails after inventory reserved:

Order deleted ✓
But inventory still reserved ✗

If deleteOrder fails:

Order exists
Inventory reserved
Payment might partially succeed

This isn't a transaction—it's a collection of API calls that can fail independently.

2. Doesn't Handle Partial Failures

What if inventoryService is down but order already created? User gets error, but order exists in database. They retry, creating duplicate order.

3. No Compensation Logic

Real-world approach: Saga pattern with compensation

Correct Human-Designed Solution:

// Saga coordinator
async function createOrderSaga(userId: string, items: OrderItem[]) {
  const sagaId = uuid();
  let order, reservationId, paymentId;
  
  try {
    // Step 1: Create order (pending state)
    order = await orderService.createOrder({
      userId,
      items,
      status: 'pending',
      sagaId
    });
    
    // Step 2: Reserve inventory
    reservationId = await inventoryService.reserve({
      items,
      orderId: order.id,
      sagaId
    });
    
    // Step 3: Initiate payment
    paymentId = await paymentService.charge({
      amount: order.total,
      orderId: order.id,
      sagaId
    });
    
    // Step 4: Confirm order
    await orderService.confirmOrder(order.id);
    
    return order;
    
  } catch (error) {
    // Compensation: Undo in reverse order
    logger.error('Order saga failed, compensating', {
      sagaId,
      orderId: order?.id,
      step: error.step
    });
    
    if (paymentId) {
      await paymentService.refund(paymentId, { reason: 'saga-compensation' });
    }
    
    if (reservationId) {
      await inventoryService.releaseReservation(reservationId);
    }
    
    if (order) {
      await orderService.cancelOrder(order.id, { reason: 'saga-failed' });
    }
    
    throw new OrderCreationError('Order could not be completed', {
      sagaId,
      originalError: error
    });
  }
}

// Services support saga operations
class InventoryService {
  async reserve(params: ReserveParams): Promise<string> {
    const reservation = await this.db.reservations.create({
      ...params,
      expiresAt: Date.now() + 15 * 60 * 1000, // 15 min expiry
      status: 'reserved'
    });
    
    // Reserve inventory
    await this.db.inventory.decrement('available', params.quantity);
    
    return reservation.id;
  }
  
  async releaseReservation(reservationId: string): Promise<void> {
    const reservation = await this.db.reservations.findById(reservationId);
    
    // Return inventory to available pool
    await this.db.inventory.increment('available', reservation.quantity);
    await this.db.reservations.update(reservationId, { status: 'released' });
  }
}

Why Human Design Is Critical:

Business rules: What's the order of operations? What can be undone?
Timeouts: How long should inventory be reserved?
Idempotency: What if user retries during saga?
Monitoring: How do we track saga state?
Failure modes: Partial compensation failures?

AI can generate code structure, but humans must design the saga flow based on business requirements.

Validation Checklist:

Business invariants documented and preserved
Saga steps in correct order (can later steps be compensated?)
Compensation logic for each step
Idempotency (safe to retry)
Timeout handling (reservation expiry)
Saga state tracking (for debugging)
Test: What happens if any step fails?
Test: What happens if compensation fails?

Scenario 2: Security-Critical Code Paths

Why Humans Must Lead

Security requires threat modeling. AI doesn't think like an attacker. It makes code "cleaner" but removes defense-in-depth checks, introduces timing vulnerabilities, or changes authentication subtly.

Real Example

Before:

async function verifyPasswordAndLogin(username: string, password: string) {
  const user = await userRepo.findByUsername(username);
  
  if (!user) {
    // Intentional delay to prevent timing attacks
    await bcrypt.hash(password, 12);
    throw new AuthError('Invalid credentials');
  }
  
  const isValid = await bcrypt.compare(password, user.passwordHash);
  
  if (!isValid) {
    // Log failed attempt
    await auditLog.create({
      event: 'login_failed',
      username,
      ip: request.ip,
      timestamp: Date.now()
    });
    
    // Increment failed attempt counter
    await rateLimiter.increment(username);
    
    throw new AuthError('Invalid credentials');
  }
  
  // Check if account is locked
  if (user.lockedUntil && user.lockedUntil > Date.now()) {
    throw new AuthError('Account locked. Try again later.');
  }
  
  // Reset failed attempts on successful login
  await rateLimiter.reset(username);
  
  // Generate session
  const token = jwt.sign({ userId: user.id }, process.env.JWT_SECRET, {
    expiresIn: '1h'
  });
  
  return { token, user };
}

AI Suggestion ("Cleaner"):

async function verifyPasswordAndLogin(username: string, password: string) {
  const user = await userRepo.findByUsername(username);
  
  if (!user || !(await bcrypt.compare(password, user.passwordHash))) {
    throw new AuthError('Invalid credentials');
  }
  
  const token = jwt.sign({ userId: user.id }, process.env.JWT_SECRET, {
    expiresIn: '1h'
  });
  
  return { token, user };
}

What AI Removed (Each is a security issue):

1. Timing Attack Prevention

Original: If user doesn't exist, still runs bcrypt (expensive operation). Attacker can't determine if username exists based on response time.

AI version: If user doesn't exist, returns immediately. Attacker can enumerate valid usernames by timing the response.

2. Audit Logging

Original: Logs all failed login attempts for security monitoring AI version: No logging. Can't detect brute force attacks or compromised accounts.

3. Rate Limiting

Original: Increments failed attempt counter, locks account after threshold AI version: No rate limiting. Attacker can try unlimited passwords.

4. Account Lockout Check

Original: Checks if account is locked before issuing token AI version: No check. Locked accounts can still log in.

5. Failed Attempt Reset

Original: Resets counter on successful login (prevents permanent lockout from old failed attempts) AI version: No reset logic.

Correct Refactoring (If Any):

async function verifyPasswordAndLogin(username: string, password: string) {
  // Always log authentication attempts
  const attemptId = await auditLog.create({
    event: 'login_attempt',
    username,
    ip: request.ip,
    timestamp: Date.now()
  });
  
  try {
    const user = await userRepo.findByUsername(username);
    
    // Timing attack prevention: Always perform expensive operation
    if (!user) {
      await bcrypt.hash(password, 12); // Same cost as compare
      throw new AuthError('Invalid credentials');
    }
    
    // Check lockout before password verification (prevent brute force)
    if (user.lockedUntil && user.lockedUntil > Date.now()) {
      await auditLog.update(attemptId, { result: 'account_locked' });
      throw new AuthError('Account locked. Contact support.');
    }
    
    // Verify password
    const isValid = await bcrypt.compare(password, user.passwordHash);
    
    if (!isValid) {
      await this.handleFailedLogin(user, attemptId);
      throw new AuthError('Invalid credentials');
    }
    
    // Successful login
    await this.handleSuccessfulLogin(user, attemptId);
    
    return {
      token: this.generateToken(user),
      user: this.sanitizeUser(user)
    };
    
  } catch (error) {
    // Don't leak information in error messages
    if (error instanceof AuthError) {
      throw error;
    }
    
    // Log unexpected errors but return generic message
    logger.error('Login error', { username, error });
    throw new AuthError('Authentication failed');
  }
}

private async handleFailedLogin(user: User, attemptId: string) {
  await auditLog.update(attemptId, { result: 'failed', userId: user.id });
  
  const failedAttempts = await rateLimiter.increment(user.username);
  
  // Lock account after threshold
  if (failedAttempts >= 5) {
    const lockDuration = Math.min(60 * 60 * 1000 * (2 ** (failedAttempts - 5)), 24 * 60 * 60 * 1000);
    await userRepo.update(user.id, {
      lockedUntil: Date.now() + lockDuration
    });
    await notifyUser(user.email, 'account_locked');
  }
}

private async handleSuccessfulLogin(user: User, attemptId: string) {
  await auditLog.update(attemptId, { result: 'success', userId: user.id });
  await rateLimiter.reset(user.username);
  
  // Update last login
  await userRepo.update(user.id, {
    lastLoginAt: Date.now(),
    lastLoginIp: request.ip
  });
}

Why Human Review Is Critical:

Threat modeling: What attacks are possible?
Defense in depth: Multiple layers of security
Behavioral security: Rate limiting, anomaly detection
Audit requirements: Compliance needs
Error handling: Don't leak information

AI optimizes for code simplicity. Security requires deliberate complexity.

Validation Checklist:

Timing attacks prevented
Rate limiting in place
Failed attempts logged
Account lockout implemented
No information leakage in errors
Token expiry appropriate
Defense in depth maintained
Penetration test after changes

Scenario 3: Performance-Sensitive Hotspots

Why Humans Must Lead

Performance requires profiling data. AI makes code "cleaner" or "more functional" without considering that this path processes 10M records per hour. Algorithmic complexity matters at scale.

Real Example

Before (Optimized but less readable):

function processOrders(orders: Order[]): ProcessedOrder[] {
  // Pre-allocate array for performance
  const processed = new Array(orders.length);
  const lookup = new Map<string, number>();
  
  // Build lookup table O(n)
  for (let i = 0; i < orders.length; i++) {
    lookup.set(orders[i].id, i);
  }
  
  // Process in place, avoid allocations
  for (let i = 0; i < orders.length; i++) {
    const order = orders[i];
    processed[i] = {
      id: order.id,
      total: order.items.reduce((sum, item) => sum + (item.price * item.quantity), 0),
      itemCount: order.items.length,
      userId: order.userId
    };
  }
  
  return processed;
}

AI Suggestion ("More Functional"):

function processOrders(orders: Order[]): ProcessedOrder[] {
  return orders.map(order => ({
    id: order.id,
    total: order.items
      .map(item => ({ total: item.price * item.quantity }))
      .reduce((sum, item) => sum + item.total, 0),
    itemCount: order.items.length,
    userId: order.userId
  }));
}

What Changed:

Functionally equivalent, but:

Original:

Pre-allocates result array
Single pass through data
Minimal allocations
~100ms for 100K orders

AI Version:

Map creates new array
Items mapped to intermediate objects
Then reduced
Multiple allocations per order
~280ms for 100K orders (2.8x slower)

At Scale:

1M orders/hour
Original: Processes in 1 second
AI version: Processes in 2.8 seconds
That's 1.8 seconds * 1000 times/hour = 30 minutes of extra CPU time per hour

When AI Optimization Is Acceptable:

If this runs once per user request and processes 10 orders: 280ms → 2.8ms. The difference is negligible. Functional style is fine.

If this runs in a batch job processing millions of records: The difference matters. Keep performance-optimized version.

The Problem: AI doesn't know which scenario you're in.

Validation Checklist:

Profile before/after with production-like data volumes
Check algorithmic complexity (O(n) → O(n²) is red flag)
Measure memory allocations
Consider this at scale (10x, 100x current volume)
Benchmark hot paths
Review profiling data (where is time spent?)

When to Use AI:

Generate algorithmic alternatives, then benchmark:

Prompt: "This function processes 1M orders per hour and takes 2 seconds. 
Suggest 3 optimization approaches with expected complexity improvements."

AI might suggest:
1. Parallel processing (split into batches)
2. Streaming instead of array mapping
3. Caching repeated calculations

Then YOU benchmark each and choose.

AI is idea generator. Humans measure and decide.

The Validation Checklist

Before merging any AI refactoring, ask these questions:

1. Functional Correctness

Does it do exactly what the original code did?
Are edge cases handled the same way?
Does it pass all existing tests?

2. Behavior Preservation

Are error messages the same (or better)?
Is logging/monitoring preserved?
Are side effects identical?
Is timing/ordering preserved?

3. API Compatibility

Are function signatures unchanged (or deprecated properly)?
Do all callers still work?
Is backward compatibility maintained?

4. Tests

Do existing tests pass?
Are new tests needed?
Are tests updated if behavior changed intentionally?

5. Performance

Is performance the same or better?
For hot paths: benchmark before/after
Check algorithmic complexity didn't regress

6. Security

No security checks removed?
Auth/authz logic unchanged?
No new vulnerabilities introduced?

7. Rollback Plan

Can I revert this easily?
What's the blast radius if it breaks?
Is there data I can't easily roll back (schema changes)?

8. Team Review

For yellow/red scenarios: Get second opinion
Explain AI changes in PR description
Highlight non-obvious changes

9. Monitoring

Add monitoring if refactoring is risky
Plan to watch error rates after deploy
Can I detect issues quickly if they occur?

10. Documentation

Are comments still accurate?
Does documentation need updating?
Are architectural decisions explained?

The "Sleep On It" Rule

For any refactoring in yellow/red zones: Wait 24 hours before merging.

Fresh eyes catch issues. Come back tomorrow, review again. If you still feel confident, merge.

Measuring Refactoring Quality Over Time

Track these metrics to improve your AI refactoring process:

Metric 1: Defect Rate

Track: Bugs caused by AI refactorings vs. manual refactorings

Target: AI should be equal or better than manual (not worse)

How to Measure:

Tag commits: "AI-assisted refactoring" or "Manual refactoring"
Track production bugs back to commits
Calculate: bugs per 100 refactoring commits

Example:

Manual refactoring: 3 bugs per 100 commits
AI refactoring: 2 bugs per 100 commits
Conclusion: AI refactoring is actually safer (with your review process)

Metric 2: Code Quality Metrics

Track: Complexity, maintainability, duplication before/after

Tools: SonarQube, CodeClimate, ESLint complexity rules

Example:

Before: Cyclomatic complexity 18
After AI refactoring: Cyclomatic complexity 6
Conclusion: Refactoring improved quality

Metric 3: Team Velocity

Track: Story points per sprint, time to complete features

Goal: Refactoring should save time, not create churn

Red Flag: If velocity decreases, you might be:

Over-refactoring
Introducing bugs that cause rework
Spending too long reviewing AI suggestions

Metric 4: Time Saved

Track: Time spent on refactoring tasks before/after AI

Example:

Manual refactoring: 4 hours
AI-assisted refactoring: 1 hour (including review)
Time saved: 3 hours (75%)

Metric 5: False Start Rate

Track: How often do you start accepting an AI refactoring, then realize it's wrong and revert?

Target: <5%

If higher: Your categorization (green/yellow/red) needs adjustment, or prompts need refinement.

Conclusion

AI refactoring is powerful, but not autopilot.

The framework:

Categorize: Green (trust), Yellow (verify), Red (override)
Validate: Use the checklist
Measure: Track quality over time

The teams doing this well:

Trust AI for mechanical refactorings (extract method, rename)
Verify closely for complex changes (async, databases, errors)
Design manually for high-risk areas (security, distributed systems, performance)

The teams struggling:

Trust everything (lead to production incidents)
Trust nothing (waste AI opportunity)
Don't measure (can't improve)

Your action plan:

This week: Review 5 AI refactorings using the framework
This month: Track defect rate (AI vs. manual refactorings)
This quarter: Refine your green/yellow/red categories based on data

The goal isn't to avoid AI refactorings—it's to use them safely and effectively. When you know when to trust and when to override, AI becomes a force multiplier.

Practical Takeaways

Print This Framework:

Green: Trust 90%, 2-min review
Yellow: Verify closely, check edge cases
Red: Human-designed, AI for ideas only

Your Review Checklist:

Functional correctness preserved
All callers still work
Tests pass
No performance regression
Security unchanged
Can rollback easily

Start Here:

Take next AI refactoring suggestion
Categorize it (green/yellow/red)
Use appropriate review level
Track outcome (did it work?)
Adjust categories based on learnings

The 60/30/10 rule (60% good, 30% neutral, 10% harmful) only holds if you review intelligently. With this framework, you can push that to 90/10/0.

That's the goal: 90% of AI refactorings make your code better, 10% are neutral, 0% cause production issues.

You get there by knowing when to trust, when to verify, and when to override.

The AI Refactoring Playbook: When to Trust the Bot, When to Override

TL;DR

The AI Refactoring Playbook: When to Trust the Bot, When to Override

The Trust Decision Framework

The 4-Quadrant Matrix

Green Light Scenarios: Trust AI 90%+

Scenario 1: Extract Method Refactoring

Scenario 2: Rename Variables for Clarity

Scenario 3: Type Annotation Additions

Scenario 4: Simple Pattern Migrations

Yellow Light Scenarios: Verify Closely

Scenario 1: Database Schema Changes

Scenario 2: Async/Sync Boundary Modifications

Scenario 3: Error Handling Rewrites

Red Light Scenarios: Override Often

Scenario 1: Cross-Service Transaction Logic

Scenario 2: Security-Critical Code Paths

Scenario 3: Performance-Sensitive Hotspots

The Validation Checklist

1. Functional Correctness

2. Behavior Preservation

3. API Compatibility

4. Tests

5. Performance

6. Security

7. Rollback Plan

8. Team Review

9. Monitoring

10. Documentation

Measuring Refactoring Quality Over Time

Metric 1: Defect Rate

Metric 2: Code Quality Metrics

Metric 3: Team Velocity

Metric 4: Time Saved

Metric 5: False Start Rate

Conclusion

Practical Takeaways

Topics

About Ruchit Suthar

TL;DR

The AI Refactoring Playbook: When to Trust the Bot, When to Override

The Trust Decision Framework

The 4-Quadrant Matrix

Green Light Scenarios: Trust AI 90%+

Scenario 1: Extract Method Refactoring

Scenario 2: Rename Variables for Clarity

Scenario 3: Type Annotation Additions

Scenario 4: Simple Pattern Migrations

Yellow Light Scenarios: Verify Closely

Scenario 1: Database Schema Changes

Scenario 2: Async/Sync Boundary Modifications

Scenario 3: Error Handling Rewrites

Red Light Scenarios: Override Often

Scenario 1: Cross-Service Transaction Logic

Scenario 2: Security-Critical Code Paths

Scenario 3: Performance-Sensitive Hotspots

The Validation Checklist

1. Functional Correctness

2. Behavior Preservation

3. API Compatibility

4. Tests

5. Performance

6. Security

7. Rollback Plan

8. Team Review

9. Monitoring

10. Documentation

Measuring Refactoring Quality Over Time

Metric 1: Defect Rate

Metric 2: Code Quality Metrics

Metric 3: Team Velocity

Metric 4: Time Saved

Metric 5: False Start Rate

Conclusion

Practical Takeaways

Topics

About Ruchit Suthar

Related Articles

AI Code Review Patterns That Actually Catch Architecture Violations

The GitHub Copilot Strategy for 2026: From Autocomplete to Architecture Copilot

AI Test Generation: The 3 Types You Should Automate and the 2 You Shouldn't

Stay Updated