Software Craftsmanship

Feature Flags Done Right: Progressive Delivery Without Blowing Up Production

37 flags, no one knows what 18 do. Flag hell happens when power tools become toys. Learn the 4 flag types (release/experiment/ops/permission), lifetimes, gradual rollout patterns (5%→100%), cleanup discipline, and the avoid-flag-hell checklist.

Ruchit Suthar
Ruchit Suthar
November 18, 20258 min read
Feature Flags Done Right: Progressive Delivery Without Blowing Up Production

TL;DR

Feature flags enable gradual rollouts, A/B tests, and instant kill switches—but create "flag hell" if mismanaged. Set expiration dates, use clear naming, track ownership, limit nesting, and remove flags religiously. Flags should be temporary (weeks), not permanent configuration. Good flags enable safe deployments; bad flags create unmaintainable codebases.

Feature Flags Done Right: Progressive Delivery Without Blowing Up Production

The Flag That Was Never Removed

Your codebase has 37 feature flags. No one knows what 18 of them do. Three are layered on top of each other—the new checkout depends on the new payment flow which depends on the new auth system.

Someone asks: "Is the old checkout code still running in production?"

Silence. No one knows. The flags are nested three levels deep. Half of them don't have owners. Two haven't been touched in 2 years.

You wanted feature flags to ship safely and fast. Instead, you've got flag hell: a codebase where every code path has if (flag.isEnabled()) checks, no one dares remove old code, and debugging requires understanding which combination of 12 flags is active.

Feature flags are power tools. Used well, they let you deploy code without releasing it, test in production safely, and roll back instantly. Used poorly, they turn your codebase into an unmaintainable mess.

Let's talk about how to use feature flags without creating chaos.

What Feature Flags Are Actually Good For

Before adding a flag, ask: What job is this flag doing?

1. Gradual Rollouts

Job: Deploy new code to production but only enable it for a small percentage of users first. Monitor. If it's stable, increase gradually. If it breaks, roll back instantly.

Example: New recommendation algorithm. Enable for 5% of users → 25% → 50% → 100%. If conversion drops, disable the flag immediately without redeploying.

2. A/B Tests and Experiments

Job: Test two versions of a feature to see which performs better. Flag controls which users see which version.

Example: Experiment: "Buy Now" button vs "Add to Cart" button. 50% of users see each. After 2 weeks, measure conversion. Winner becomes default.

3. Kill Switches for Risky Integrations

Job: Turn off a risky feature or integration instantly if it starts causing problems, without needing a deploy.

Example: New payment provider integration. If it starts timing out or charging cards twice, flip the kill switch to fall back to the old provider.

4. Customer or Region-Specific Enablement

Job: Enable features for specific customers, teams, or regions without branching code.

Example: Enterprise customers get SSO, free-tier users don't. EU customers see GDPR-compliant consent, others don't.

What Flags Are Not

Feature flags are not:

  • A substitute for design decisions ("let's flag it and decide later")
  • A way to avoid deleting code
  • Permanent configuration (that's a config setting, not a flag)

If you're adding a flag and can't articulate which of the 4 use cases it serves, don't add it.

Types of Flags and Their Lifetimes

Not all flags are the same. Different types have different lifetimes and management needs.

Release Flags (Short-Lived)

Purpose: Enable a new feature gradually during rollout.

Lifetime: Days to weeks. Remove after rollout is complete.

Example:

if (featureFlags.isEnabled('new-checkout')) {
  return <NewCheckout />;
} else {
  return <OldCheckout />;
}

After rollout: Remove the flag, remove the old code path.

// After cleanup
return <NewCheckout />;

Discipline required: Delete the flag and old code within 2 weeks of full rollout.

Experiment Flags (Short-Lived)

Purpose: A/B test to determine which version performs better.

Lifetime: Duration of experiment (1-4 weeks). Remove after experiment concludes.

Example:

if (experimentFlags.getVariant('checkout-button-text') === 'buy-now') {
  return <Button>Buy Now</Button>;
} else {
  return <Button>Add to Cart</Button>;
}

After experiment: Remove the flag, keep the winning variant.

Discipline required: Set end date when creating experiment. Clean up when experiment ends.

Ops Flags / Kill Switches (Long-Lived)

Purpose: Turn off risky features or integrations in emergencies.

Lifetime: Months to years. Stay in code as long as the feature exists.

Example:

if (opsFlags.isEnabled('payment-provider-stripe')) {
  await stripePayment.charge(amount);
} else {
  await legacyPayment.charge(amount);
}

These flags don't get removed—they're operational safety valves. But they should be:

  • Clearly named (enable-stripe-payments, not use-new-payment)
  • Monitored (alert if kill switch is flipped)
  • Tested (regularly test the fallback path)

Permission / Entitlement Flags (Long-Lived)

Purpose: Control access to features based on customer tier, role, or region.

Lifetime: Permanent (as long as the feature exists).

Example:

if (user.hasPlan('enterprise')) {
  return <SSOSettings />;
} else {
  return <UpgradePrompt />;
}

These are not really "feature flags"—they're product entitlements. They're part of your business logic. Don't think of them as temporary.

Implementation: Use permission checks, not generic feature flags:

// Better
if (user.hasPermission('sso.configure')) {
  // ...
}

Flag Lifetime Summary

Type Lifetime Cleanup
Release flag Days to weeks Delete after full rollout
Experiment flag 1-4 weeks Delete after experiment ends
Ops / kill switch Months to years Keep as safety valve
Permission flag Permanent Part of business logic

Key rule: 80% of your flags should be short-lived. If most flags are permanent, you're using flags wrong.

Safe Rollout Patterns with Flags

Feature flags enable progressive delivery: deploy code to production, enable it gradually, monitor, decide whether to continue or roll back.

Step 1: Internal Users First

Enable the feature for your team or internal beta testers.

if (user.isInternalUser() || featureFlags.isEnabled('new-search')) {
  return <NewSearch />;
}

Goal: Catch obvious bugs in real production environment without affecting customers.

Duration: 1-3 days.

Step 2: Small Percentage of Traffic

Enable for 5% of users. Monitor key metrics: error rate, latency, conversion.

if (featureFlags.isEnabledForUser('new-search', user.id, { rolloutPercent: 5 })) {
  return <NewSearch />;
}

Goal: Catch edge cases and performance issues at small scale.

Duration: 1-3 days. If metrics look good, continue. If errors spike, roll back.

Step 3: Specific Region or Segment

Enable for a specific region or user segment to further validate.

if (user.region === 'US' && featureFlags.isEnabled('new-search')) {
  return <NewSearch />;
}

Goal: Isolate impact to one segment. If something breaks, only US users are affected, not global.

Duration: 1-7 days.

Step 4: Gradual Increase

Increase rollout percentage: 25% → 50% → 100%, monitoring at each step.

Goal: Catch issues that only appear at higher scale (database contention, cache exhaustion).

Duration: 1-2 weeks for full rollout.

Step 5: Remove the Flag

Once 100% of users see the new feature and it's stable, remove the flag and old code.

// Before
if (featureFlags.isEnabled('new-search')) {
  return <NewSearch />;
} else {
  return <OldSearch />;
}

// After cleanup
return <NewSearch />;

Delete OldSearch component. Remove flag from config.

How to Roll Back Quickly

If errors spike or metrics drop:

  1. Flip the flag to 0% (or off). Takes seconds.
  2. Investigate: Look at logs, traces, error messages.
  3. Fix the issue in a new deploy.
  4. Re-enable gradually following the same rollout steps.

This is why flags are powerful: instant rollback without redeploying.

Avoiding 'Flag Hell' in the Codebase

Without discipline, flags accumulate and create mess. Here's how to keep control.

Anti-Pattern 1: Nested Flags

if (flags.isEnabled('new-auth')) {
  if (flags.isEnabled('new-payment')) {
    if (flags.isEnabled('new-checkout')) {
      return <NewNewNewCheckout />;
    }
  }
}

Problem: Combinatorial explosion. Testing every combination is impossible. No one knows what's actually enabled.

Fix: Avoid nested flags. If feature B depends on feature A, don't flag B until A is fully rolled out.

Anti-Pattern 2: Dead Flags

Flags that were enabled 100% months ago but never removed. Code still has if (flag.isEnabled()) checks.

Problem: Codebase bloat. Every engineer has to read and understand flags that do nothing.

Fix: Flag cleanup ritual. Every sprint, review flags:

  • Which flags have been 100% enabled for >2 weeks? Remove them.
  • Which experiment flags have ended? Remove them.

Anti-Pattern 3: Unclear Naming

if (flags.isEnabled('new-feature')) { ... }

What feature? New compared to what? When was it added?

Fix: Descriptive names with dates or purpose:

// Good
if (flags.isEnabled('checkout-v2-rollout-2025-11')) { ... }
if (flags.isEnabled('stripe-payment-integration')) { ... }
if (flags.isEnabled('experiment-button-color-nov2025')) { ... }

Anti-Pattern 4: No Owner or Expiry Date

Flag exists. No one knows who created it or when it should be removed.

Fix: Metadata for every flag:

flags:
  checkout-v2-rollout-2025-11:
    description: "Gradual rollout of new checkout flow"
    owner: "alice@company.com"
    created: "2025-11-01"
    type: "release"
    expiresAfter: "2025-12-01"  # Force review if not removed by then

Some flag management systems support this. If yours doesn't, keep a spreadsheet or doc.

Practices to Avoid Flag Hell

1. Consistent naming conventions:

  • rollout-{feature}-{date} for release flags
  • experiment-{name}-{date} for experiments
  • killswitch-{integration} for ops flags

2. One owner per flag: The person who created it is responsible for cleanup.

3. Expiry dates or "review after" dates: Force periodic review.

4. Regular cleanup rituals: Every 2 weeks, review all flags:

  • Which can be deleted?
  • Which are at 100% and can be made permanent?

5. Automated reminders: Alert in Slack when a flag has been enabled 100% for >1 week: "Time to clean up rollout-checkout-v2?"

Observability and Flags

Flags change behavior. You need to know:

  • Which flags are enabled for this request?
  • Did this error happen because of a flag?

Log Flag States

Include active flags in structured logs:

logger.info('Order created', {
  correlationId,
  userId,
  orderId,
  flags: {
    'checkout-v2': true,
    'stripe-payments': true,
    'discount-engine': false
  }
});

Now when debugging, you can see: "This order used the new checkout."

Tag Metrics with Flag States

When tracking metrics, tag them by flag variant:

metrics.increment('orders.created', {
  checkoutVersion: flags.isEnabled('checkout-v2') ? 'v2' : 'v1'
});

Now you can compare: "v2 checkout has 5% higher conversion than v1."

Monitor Flag Changes

Alert when a flag is flipped (especially kill switches):

Alert: Feature flag 'stripe-payments' was disabled at 2:15am by ops-user.
Reason: Payment timeout spike.

This helps with incident response: "Did something change recently?" "Yes, we flipped the Stripe flag."

Closing: Flags as Power Tools, Not Toys

Feature flags are sharp tools. They let you:

  • Deploy code to production without releasing it to users.
  • Test new features on a small slice of traffic.
  • Roll back instantly without redeploying.

But they require discipline:

  • Short-lived flags must be cleaned up within weeks.
  • Flags need owners, expiry dates, and clear purpose.
  • Flag hell happens when you treat flags as permanent.

Good flag discipline:

  • 80% of flags are deleted within a month.
  • Every flag has an owner and expiry date.
  • Codebase has <10 active flags at any time.
  • Old code paths are removed after rollout.

Bad flag discipline:

  • 50+ flags in the codebase.
  • No one knows what half of them do.
  • Nested flags 3 levels deep.
  • Dead flags from 2 years ago still in code.

Checklist: Introducing or Cleaning Up Feature Flags

Before introducing a new flag:

  • Purpose: Which use case? (Release, experiment, ops, or permission?)
  • Naming: Descriptive name with purpose and date?
  • Owner: Who's responsible for cleanup?
  • Expiry date: When should this be reviewed or removed?
  • Rollout plan: Gradual rollout steps defined?
  • Observability: Logged and tracked in metrics?

Cleaning up existing flags:

  • List all active flags: Use flag management tool or search codebase for featureFlags.isEnabled.
  • Categorize by type: Release, experiment, ops, or permission?
  • Check rollout status: Which flags are at 100% for >2 weeks?
  • Remove dead flags: Delete flag config and associated code paths.
  • Document remaining flags: Add owners, expiry dates, and descriptions.
  • Set cleanup ritual: Every 2 weeks, review flags and remove completed rollouts.

Feature flags are not a substitute for making decisions. They're a tool for deploying safely and iterating fast.

Use them to ship gradually, learn from production, and roll back when needed. But don't let them turn your codebase into a maze of conditionals.

Ship fast. Clean up faster.

Topics

feature-flagsprogressive-deliverydeploymentcontinuous-deliveryab-testingdevops
Ruchit Suthar

About Ruchit Suthar

Technical Leader with 15+ years of experience scaling teams and systems