Release Management
Separating deployment from release — and using that separation to ship more safely and more often.
Overview
A deployment is an act performed by engineering: code is moved from one environment to another. A release is a business decision: a feature or change becomes visible to users. Conflating the two — assuming that deploying means releasing — limits how safely and quickly teams can ship. Separating them unlocks a set of tools for controlled, measured, and reversible rollouts.
This page is about the principles behind that separation: how features are decoupled from deploys, how rollouts are staged to limit blast radius, and how the ability to roll back or roll out faster gives teams confidence rather than caution at release time.
For the mechanics of getting code into an environment, see Deployment Automation. For the targets a release must meet to count as healthy, see SLOs & Error Budgets. For go-live criteria before a major release, see Operational Readiness.
Why It Matters
Deployment risk is disproportionate to deployment frequency. Teams that deploy infrequently deploy large batches; large batches are difficult to attribute when something breaks and create long windows where production is different from what's tested. Frequent, small deployments are inherently safer — but only if the team can deploy without releasing, so that the window between writing code and exposing it to users can be controlled.
Feature flags change the rollback unit. Without feature flags, rolling back a feature means reverting code and redeploying. With flags, rollback is an operation that takes seconds and requires no deploy. The ability to disable a feature in production in under a minute is one of the most valuable reliability properties a system can have.
Progressive delivery limits blast radius. Releasing to 1% of users and watching the metrics for 30 minutes before widening costs almost nothing in velocity and prevents most class of user-impacting release bugs from affecting everyone. The cost of not doing it is incidents that affect 100% of users.
Rollback SLOs set an expectation, not a hope. "We can roll back quickly if something goes wrong" is not a reliability property unless it is defined, measured, and practiced. A team that has never practiced a rollback under pressure will discover the gaps in their rollback process at the worst possible time.
Standards & Best Practices
Deploy does not equal release
The default architecture should support deploying code without changing the user-visible behaviour. The mechanisms for this:
- Feature flags — code is deployed but gated behind a flag that is off by default
- Dark launches — code is deployed and runs against production traffic, but output is discarded rather than served to users (useful for verifying performance and correctness before release)
- Canary deployments — a small percentage of traffic is routed to the new version while the rest continues on the old
The goal is that deploying is a mechanical, low-risk operation. The release — the decision to show something to users — is a separate, deliberate, observable event.
Feature flags are a first-class engineering tool
Feature flags (also called feature toggles or feature gates) conditionally enable code paths at runtime without a deploy. Their lifecycle:
- Creation — the flag is created in the feature flag system with a sensible default (off for new features)
- Development — the feature is built behind the flag;
if flag_enabled("my-feature")wraps the new code path - Internal rollout — the flag is enabled for internal users to validate the feature
- Staged rollout — the flag is enabled for increasing percentages of users, with metric observation at each stage
- Full rollout — the flag is enabled for 100% of users
- Removal — the flag and the old code path are removed from the codebase
Step 6 is mandatory. Flag debt — old flags left in the code long after the feature has been fully rolled out — is a maintenance burden and an incident risk. Set a maximum flag lifetime (typically 2–4 sprints after full rollout) and enforce it.
Types of feature flags and their appropriate use:
| Type | Description | Lifetime | Example |
|---|---|---|---|
| Release flag | Hides an in-progress feature | Short (remove after full rollout) | New checkout flow |
| Ops flag | Allows disabling a feature at runtime | Medium (kept as a kill switch) | Expensive recommendation algorithm |
| Experiment flag | A/B test with metric comparison | Short (remove after winner declared) | Button copy variant |
| Permission flag | Feature enabled for specific cohorts | Permanent | Beta feature for enterprise tier |
Progressive delivery is the default rollout strategy
Releasing to everyone simultaneously creates maximum blast radius. Progressive delivery limits the scope of any release problem:
- Percentage-based rollout — release to N% of users, observe key metrics for a defined window, then increase to the next tier (e.g. 1% → 5% → 20% → 100%)
- Cohort-based rollout — release first to internal users, then beta customers, then all customers
- Region-based rollout — release to one region first, verify stability, then expand geographically
The observation window at each tier should be long enough to detect the failure modes that matter: elevated error rates, latency regressions, downstream API failures, user-reported issues. An observation window of 15 minutes that does not cover the overnight batch processing window missed a class of bug entirely.
Automated rollout gates — metrics checks that block a rollout from progressing if key signals degrade — are more reliable than manual observation. They do not depend on someone watching the dashboard.
Rollback must be defined, measurable, and practiced
A rollback capability is not real until it has been used and its duration is known. Standards:
- Define the rollback procedure for every release type (feature flag flip, code rollback, database schema rollback)
- Set a rollback SLO — the maximum time from decision to rollback complete. A common target for a feature flag rollback is under 5 minutes; a code rollback is typically under 15 minutes.
- Practice rollbacks — deliberately roll back a deployment in staging periodically, timing the process and identifying gaps
- Track rollback duration as an SLO metric, the same way availability is tracked
A team that has never timed their rollback process does not know how long it takes, and will discover this during an incident.
Release notes and communication are part of the release process
A deployment that releases to users without any communication to support, customer success, or users themselves is a release where the people who deal with user questions are unprepared. Standards:
- Every user-visible change has a description written before it ships (even a single sentence)
- Support and customer success teams are notified of changes that affect their workflows before the change reaches production
- Breaking changes or significant behaviour changes have a communication plan — not just a changelog entry
Communication is not bureaucracy. It is the difference between a support team that can answer "why did X change?" and one that says "we don't know."
How to Implement
Feature flag system requirements
A feature flag system must support:
- Targeting by user ID, percentage, cohort, or environment
- Instant flag evaluation without a deployment
- Audit log of who changed what flag and when (flag changes are production changes)
- Flag lifecycle tracking — creation date, owner, intended removal date
Self-hosted flag systems (LaunchDarkly, Unleash, Flagsmith, or bespoke) are all valid. What matters is that the system is treated as production infrastructure — not a configuration file checked in alongside the code.
Rollout checklist for a flagged feature
Before widening rollout from any tier to the next:
- Error rate at the current tier is within normal bounds
- Latency at the current tier is within normal bounds
- Any customer-impacting bugs found at the current tier are resolved
- Downstream services have not seen elevated error rates as a result of the rollout
- Support has not received an elevated volume of related tickets
- (For 100% rollout) flag removal issue is filed for the next sprint
Common Pitfalls
Treating deploy-equals-release as the default. When every deploy is a release, the team is afraid of deploying. Small, frequent deploys that do not change user-visible behaviour are safe; the release decision is when the careful evaluation happens.
Forgetting to clean up flags. Flag debt accumulates silently. A codebase with 50 stale feature flags has 50 places where old code paths are conditionally run, tested in neither state consistently, and invisible to the next engineer. Flag removal is a first-class engineering task.
Observation windows that are too short. A 15-minute observation window before widening a rollout to 100% will miss bugs that appear overnight, under weekend load, or in the monthly billing cycle. Size observation windows to the failure modes that matter.
Rollback as a theoretical option. "We can roll back if needed" without a documented procedure, without a tested duration, and without someone who has actually done it is a hope, not a capability. Practice it.
Flag targeting that cannot be turned off fast enough. A flag that requires a code deploy to change is not a real feature flag — it is a compile-time constant. Flags must be changeable without a deploy, by non-engineers, in under a minute.
Dark launching to all traffic simultaneously. A dark launch that runs the new code path against 100% of production traffic before any correctness validation just changes when the blast radius arrives. Dark launch to a small fraction first.