Kill switches — best practice
Kill switches are a safety mechanism used to shut off machinery in an emergency, when it cannot be shut down in the usual manner.
Many large sites occasionally have some features that struggle under certain edge cases, often related to third party integrations. A typical use case is that one of the payment providers is currently down and can’t take new orders.
Kill switches makes it super easy turn these flaky features off, which is a simple way to degrade the service in order to keep the business running.
“Of course, it would be nice to not need the kill switches, but they have proven their value over time.”
Senior developer, FINN.no
But how?
The general advice is to wrap your potential flaky features in inverted feature flags. Your application should assume that the feature is working as expected as long as the feature flag is disabled. In this example we use Unleash, an easy to use feature toggles service.
By having the toggle disabled by default also makes sure that your application will have the feature enabled in case it can not fetch the latest version of the feature toggle.
If you detect any problems with the integration you can then easily turn on the kill switch witch will turn of the feature toggle.
Unleash will in seconds inform the application to turn off the feature. That was pretty simple?
Can we automate?
Even though it is pretty simple to use inverted feature flags to introduce kill switches in your service it is important to keep the number of long-lived switches to a minimum. They are a powerful tool to manually degrade your service, by turning of non critical features, during high load.
You can also think about kill switches as a manually managed circuit breaker. Circuit breaker is a design pattern often used in software development today. The general concept is that you wrap a service call in a circuit breaker, which monitors failures and automate the process of stop logic form executing to prevent the logic from constantly failing.
Fortunately there exists libraries implementing this pattern:
- resilience4j (Java)
- Polly (.NET)
- circuit_breaker (Ruby)
- brakes (Node.js)
Summary
- Inverted feature flags is a simple way to manually handle kill switches.
- Keep manually managed and long-lived kill switches to a minimum.
- Prefer to build failure tolerance in to your application. The Circuit breaker pattern is one way to achieve that.