Published January 25, 2021
by Doug Klugh
Enabling CI/CD
Decouple feature releases from code deployments by encapsulating features and infrastructure changes inside switchable software modules. Facilitate continuous delivery by deploying code with isolated features that can be released on-demand, enable A/B testing, support operations with automated circuit breakers and kill switches, and provide customized, dynamic feature access to any level of functionality — making Dark Launches, Ring Deployments and Canary Releases a cinch.
Feature toggles enable you to deploy incomplete and un-tested code to your production environment using Trunk-Based Development. They are the enabler for Continuous Integration and Continuous Deployment (CI/CD). Without feature toggles, CI/CD is nearly impossible to achieve.
Continuous delivery, using feature toggles, allows you to designate product managers, business partners, or any non-technical group to control (at the flip of a switch) when a particular feature is turned on or off. Once feature development is "Done", a feature switch can be enabled within a user interface, allowing designated stakeholders to enable and disable that feature at their discresion. This can be a very large group of features or a single, very specific, low-level code path. Not only does this empower the business to decide when particular features are "shipped", but can also provide tremendous flexibility as to whom these features are released (more on that below under Feature Access).
Toggles go both ways. Not only can designated users decide when to enable features, they can quickly and easily disable features if things don't go as expected. Turning off a new feature or reverting back to the prior version is as easy as flipping a switch. This is one of many ways feature toggles help to reduce risk.
Releasing feature updates can often pose a risk of negatively impacting customer engagement. You enhance a feature and release it to your entire customer base, only to discover undetected bugs. Or you find out that your users hate the upgrades, find them less valuable, or just more difficult to use.
These enhancements may be changes to the user interface, such as the use of modern UI components, responsive layouts, different colors, new verbiage, etc. Or they can be enhancements on the back-end, related more towards performance or other non-functional qualities, such as usability, resiliency, scalability, security, etc.
Feature toggles enable you can selectively roll out updates to predetermined groups of users to verify that those enhancements will improve customer engagement — or at least not make it worse. Those select users may include beta users, power users, or customers you know to be loyal to your products and services. And if things turn out badly, you can easily and quickly revert all those users back to prior versions of those features.
This provides the capability to measure the impact of software changes. Gather telemetry prior to releasing updates to establish a baseline, deploy the updates to production, enable those new features only for your test group, gather telemetry against the new functionality, and compare the results.
Now, there is no guessing. You will have hard data to measure any aspect of your software — whether that is length and/or frequency of customer engagement, sales, referrals, usability, or system performance. All of which can be managed by product managers, business partners, marketing, or any other non-technical team.
Software changes and extensions can often lead to unexpected results and unforeseen issues. It helps to have safety valves built into your code that can be triggered manually by operations or automatically through exception handlers (read Planning for Failure). These types of feature toggles enable DevOps teams to manage risk associated with system failure and degraded performance, including reduced Capacity, Availability, Responsiveness, and Resiliency (CARR).
Kill Switches are feature toggles used to disable non-critical features to help optimize CARR for more essential ones. Circuit Breakers are feature toggles used to degrade functionality when issues are detected. Either type of safety valve can be triggered based on specific telemetry inside or outside of an application or through an external monitoring tool.
As these feature toggles are usually long-lived and tend to have high visibility, it is essential that they are well documented and included in all applicable runbooks.
Feature toggles can be a very effective tool for managing user experience by controlling access to new features, enhancements, and extensions. When designed correctly, feature toggles can control access within a specific context. In addition to identifying specific users, the context can also identify specific companies, subscriptions, locations, client platforms, dates, times, etc. This will make your system behavior context sensitive and enable you to easily control multiple levels of access to large groups of features or single, very specific, low-level code paths.
Getting Started
Getting started with feature toggles is actually quite easy. They do not require any particular development process, platform, language, tool, branching strategy, or anything else. You can start by adding a simple, short-lived kill switch to a new feature, turn it on after deployment, then turn it off if any problems are detected. Once you have high confidence that the feature is working as expected (after, perhaps, a few days), delete the toggle and leave the new code.
Start by deciding which type of feature toggle you want to develop: a release toggle to enable on-demand releases, an experiment toggle to facilitate hypothesis testing, an ops toggle to support operations, or an access toggle to manage feature access. Do you want this to be a static toggle that requires a code deployment or a dynamic toggle that will be controlled via a user interface or autonomously based on system telemetry? Also, consider the life of the toggle: Do you expect it to be short-lived or persist for a long period of time?
Like anything else, feature toggles will be easier to implement around well-factored code. It will be difficult to add a feature toggle to code that is tightly coupled to other parts of the system. Design a modular solution so you can easily isolate functionality that will be controlled by feature toggles.
For the feature toggles themselves, be sure to decouple the Toggle Point (where the toggle decision is made) from the Toggle Router (the decision logic). This will simplify changes to the decision logic by encapsulating them in a single location. While this should be considered for all feature toggles, it is especially important for long-lived toggles (to maintain Structural Quality).
To further maintain structural quality, Dependency Inversion should be used to decouple your feature toggle infrastructure, providing isolation from the rest of the system. This will prove to be especially useful in Enhancing Testability.
Another design consideration is where, within your architecture, to place your feature toggles. For toggles that are context sensitive based on individual requests, it is best to place the toggle point at the outside edge of your system. Since the user’s request typically enters your system via an API or controller, this is where the toggle router has the most context to make toggling decisions. And for many new features, controlling access may be as simple as hiding UI elements. This helps to keep toggle points out of the core of your system, but may promote a security risk by exposing unreleased functionality that is hidden at an unpublished URL.
Feature toggles that control low-level implementation will obviously need to reside deeper within the architecture. Locating the toggle point within the domain model containing the functionality being toggled is usually a good design decision. Release toggles often fall into this category.
There are many ways to implement feature toggles. Some are static configurations that require software builds and deployments, while others are more dynamic.
Probably the simplest form of a toggle is to comment out lines of code, then uncomment to activate the switch. Although this is slow to activate/deactivate and slow to test.
Toggles can also be specified via environment variables or command line arguments. While this approach can eliminate the need for a rebuild, it usually requires a redeployment or process restart.
A slightly better approach is the use of configuration files. You can activate/deactivate a feature by updating the config file, instead of rebuilding the application. Although this will usually require a redeployment of that file.
Another method is to store the configuration within a shared database. This offers a centralized store that provides consistency across a server farm. This works well with some type of admin UI (usually custom built) to manage the configuration of all feature toggles.
Feature toggle configuration can also be managed using cookies, query parameters, or HTTP headers. This provides a great deal of flexibility (especially for testing) but introduces greater risk by increasing the complexity and attack surface of your system.
As with any other development pattern, creating a solid testing strategy is an important part of using feature toggles. The first thing to understand is that you cannot test every combination of every feature toggle. It is best to test each variation of a toggle in isolation (when possible) and use default values for other toggles. Maintain awareness of which feature toggles have interactions with (affect) other toggles. Identify which combinations could create issues and take steps to guard against them.
Making your feature toggles dynamic will enable you to create new toggle routers using default configurations. You can also dynamically toggle features on and off, allowing you to test each variation of the toggle through automated testing.