We didn't adopt feature flags at Salesken until 2022, two years after I joined as CTO. I wish we'd done it from the start. In those two years, we had at least three production incidents that were fundamentally the same problem: we shipped a change to 100% of users at once, something broke for a subset, and we had no way to contain the blast radius except rolling back the entire deployment. Each rollback took 20-30 minutes. For a real-time voice AI product where sales reps are on live calls, 20 minutes of degraded service is 20 minutes of lost deals for our customers.
Feature flags would have made each of those incidents a 30-second toggle instead of a 20-minute rollback. That's not hypothetical. After we finally adopted them, we had similar issues twice more. Both times: flag off, investigation, fix, flag back on. Total user-facing impact: under a minute each time.
Feature Flags in 60 Seconds
Feature flags are conditional code blocks: if flag enabled, execute new behavior, else execute old behavior. Use cases: safe deploys (new code deploys but doesn't execute until enabled), gradual rollouts (enable for 5% of users, measure, increase to 100%), A/B testing, kill switches (disable a feature instantly if it breaks), and trunk-based development (commit to main with features behind flags, no long-lived branches). The challenge: flags add code complexity, and old flags accumulate into technical debt if you don't clean them up.
Why Feature Flags Matter Now
Deployment risk shrinks dramatically. You deploy code to production that doesn't execute until you're ready. No more "code sitting in staging for 3 days waiting for approvals." Code deploys fast. Features activate when ready.
Gradual rollouts replace big-bang launches. Instead of launching to 100% of users and discovering it breaks for 100% of users, you enable for 10% first. Measure metrics. Catch problems early. Increase to 100% only after you have confidence. At Salesken, our first flagged rollout was a new call analytics dashboard. We put it behind a flag and enabled it for our three most technically sophisticated customers first. They found two UX issues we'd missed. We fixed them before the other 200 customers ever saw the feature.
Kill switches save you during incidents. A feature is misbehaving. You don't need to deploy a new version. You disable the flag. Instant rollback. No CI pipeline. No waiting for tests. This alone justifies the investment in flag infrastructure for any team shipping to production regularly.
Trunk-based development becomes practical. Every feature doesn't need its own branch. Feature branches create merge conflicts and test divergence. Flags let you merge to main immediately and control visibility at runtime. We moved to trunk-based development at Salesken after adopting flags, and our merge conflict rate dropped by roughly 80%.
The Types of Flags
Release flags control whether a feature executes. Used for safe deploys and gradual rollouts. Lifespan: days to weeks. These are the most common and the most important to clean up.
Experiment flags compare two behaviors. 50% of users see version A, 50% see version B. Used for A/B testing. Lifespan: days to weeks. Should always have a decision attached: did B win? If nobody makes the decision, the experiment becomes permanent, which is worse than not running it.
Ops flags control infrastructure behavior. Circuit breaker for a broken service. Cache bypass during debugging. Lifespan: seconds to hours. These are emergency tools.
Permission flags control access. "Admin users see this feature, regular users don't." Can be long-lived, and that's okay. These are the exception to the "flags should expire" rule.
The key governance insight: each type has different lifecycle expectations. Release flags should expire within weeks. Experiment flags should have deadlines. Ops flags should be manual and temporary. Permission flags are structural and long-lived. Managing them all the same way is how you end up with a graveyard.
The Problem: Flag Debt
Old flags accumulate. A feature launched 2 years ago, still behind a flag, flag is always enabled, but the old code path is never executed. It's dead code that's still in the codebase, still needs testing, still adds cognitive load.
I've seen the worst version of this. At a company I worked with (not Salesken, but close to it), an engineer tried to turn off a flag that had been enabled for 18 months. The application crashed. The old code path the flag was supposed to protect had been deleted in a refactor, but nobody removed the flag evaluation. The conditional went to an else branch that called functions that no longer existed. Nobody had tested the flag-off path in over a year.
Solution: every flag needs three things before it ships.
Owner: who makes decisions about this flag?
Expiry: when should this flag be evaluated and a decision made (remove it, make it permanent)?
Cleanup plan: if the flag is removed, what code also gets removed?
Enforce this in code review. If a flag is added without these three things, it's not ready to merge.
The Governance Model
A healthy feature launch flag lifecycle looks like this: created with acceptance criteria (when is the rollout successful?), rolled out over days (1%, 5%, 25%, 100%), monitored on specific metrics (error rate, latency, custom business metrics), a decision is made (rollout succeeded, roll back, investigate), and the flag is removed once it's always-on or always-off. A flag in this state is temporary, has clear success criteria, and has an owner. A flag that's been enabled for 8 months and the owner has left the company is debt.
Good flag management: flags have owners, creation dates, expiry dates. Flags older than 6 months are reviewed quarterly. Old flags are deleted.
The graveyard: flags from 2021 controlling behavior nobody remembers. Some flags overridden by other flags. The old code path might not exist anymore. New engineers ask "why is this flag here?" and get silence.
The graveyard happens naturally if you don't have governance. Add it:
- Changelog listing flag creation and removal
- Code review requiring explanation of flag purpose and expiry plan
- Quarterly review of flags over 6 months old
- Dashboard showing which flags are actually evaluated and which code paths are executing
Pitfalls
Flag evaluation performance. If you have 50 flags and each evaluation hits a remote service, startup becomes slow and fragile. Cache flag definitions locally. Evaluate flags in-memory. Send batch updates from the flag server. At Salesken, we learned this when our flag evaluations added 150ms to every API request because we were fetching flags from LaunchDarkly on every call instead of caching locally. Simple fix, but we only caught it because we had tracing.
Flag complexity. More flags enable more strategies but add testing burden. For each flag, you need to test: the main code path (flag off), the new code path (flag on), and the transition between them. This multiplies. Ten flags means theoretically 1,024 combinations, though in practice you only test the realistic ones.
Flag overuse. Using flags for every change makes deployment "safe" but you never clean up old code. At some point, the old code paths become more liability than safety net. Set a team limit: no more than N active release flags at once. For us at Salesken, that limit was 8. It forced us to clean up before adding new ones.
Connecting Flags to Codebase Intelligence
The hardest question with feature flags isn't "should we add one?" It's "is it safe to remove this one?" Codebase intelligence helps answer that by showing which code depends on which flags, whether the old code path is still referenced anywhere, and whether any other flags interact with it.
At Glue, we surface old, unused flags and the code paths they protect. "This flag has been enabled since January 2024 but the else branch is never executed. Here's the code that would be affected if you removed it." This accelerates cleanup and prevents the graveyard.
The limitation we're honest about: detecting flag interactions (flag A's behavior changes when flag B is on) is genuinely hard to do automatically. We can identify individual flag dead code paths, but the combinatorial interaction between flags still requires human reasoning. That's an area we're actively working on.
Frequently Asked Questions
Q: Should every feature launch behind a flag?
Most features should. Exceptions: internal tools with no external users, truly low-risk changes (copy updates, minor UI tweaks), and features where rollback isn't meaningful. But when in doubt, flag it. The cost of adding a flag is low. The cost of not having one during an incident is high.
Q: How do we prevent flag debt?
Enforce policy: every flag has an owner and expiry date. Review old flags quarterly. Set a team limit on active release flags. Make flag cleanup part of the definition of done: the feature isn't complete until the flag is removed.
Q: What should the default be if the flag service is down?
Decide explicitly per flag. For release flags, default to old behavior (safe). For ops flags like circuit breakers, default to the protective state. For permission flags, default to restricted access. Never leave this undefined. At Salesken, we had one flag where the default was undefined. The flag service went down during a deploy, and half our users got both the old and new version of a feature simultaneously. Not great.
Related Reading
- Deployment Frequency: The DORA Metric That Reveals Your True Engineering Velocity
- Trunk-Based Development: Integration at Scale
- CI/CD Pipeline: The Definitive Guide
- Shifting Left: Software Quality in Practice
- Change Failure Rate: The DORA Metric That Reveals Your Software Quality
- How to Choose Your Technology Stack