17 May 2016

How to use feature flags without technical debt

Or, won’t feature flags pollute my code with a bunch of crufty if/then branches that I have to clean up later?

People often ask for advice on maintaining feature flags after they roll out the new code to all users. Here is one way to approach the issue of cleaning up the old code paths. It works pretty well for our team. If you have any other ideas, please let us know!

Write the new feature

Step one is to write the new feature, gated by the feature flag, on a short-lived feature branch off of master (call it pk/awesome-xyz-support):

Next, before you submit your pull request for this change, create a second branch, off of the first, and call it cleanup/awesome-xyz. This branch removes the feature flag, leaving just the new code:

Continue reading “How to use feature flags without technical debt” »

03 May 2016

Feature flagging to mitigate risk in database migration

There comes a time in every developer’s life when you realize you need to migrate from one database to another. Maybe you started off with MongoDB, and it was great while your app was small, but it just can’t handle the load now that you made it to the big leagues. Maybe it’s time you bit the bullet and put that huge events database in DynamoDB (or Cassandra, or…).

Moving databases is no small task, but it doesn’t need to be so risky.

One of the common misconceptions is that feature flags are only useful for new features, or for cosmetic changes. It turns out they are extremely useful for doing things like database migrations. I won’t say that feature flagging your database migration will make everything easy, but it will make it easier to test it with real, live data (for a subset of your user base), and much easier to roll it back if something goes wrong.

Continue reading “Feature flagging to mitigate risk in database migration” »

26 Apr 2016

Zombies eating your AWS bill?

Zombies AWS Bill LaunchDarkly Feature Flags

The near-infinite elasticity of Amazon’s EC2 service is amazing. You can easily and cheaply spin up new instances whenever you need them. At LaunchDarkly, we provision new instances every time we deploy a new version of one of our services, which often happens multiple times in a day. We use Ansible to orchestrate the provisioning and deploy process, which means that we have easily repeatable deploys, and no server is a snowflake. Using the ‘cattle, not pets’ mindset, we routinely create and destroy instances. The script basically does the following steps:

  1. Provisions a fresh set of instances using a base AMI image that we have created
  2. Pulls down the latest code and config files from S3 (which is where our build server publishes to)
  3. Runs a few smoke tests to be sure the app has started up
  4. Swaps the load balancer configuration so the load balancers start sending traffic to the new instances
  5. Waits for the old instances to finish processing any requests that they had received before the load balance switch
  6. De-provisions the instances running the old version of the code

Continue reading “Zombies eating your AWS bill?” »

17 Aug 2015

Best practices for testing Stripe webhook event processing

While writing the code to integrate our application with Stripe, I was very impressed with the level of polish that Stripe has put on their API, in the documentation, the language-specific SDK ergonomics, and how easy they make it to integrate with some so obviously complex as payment processing.

However, there were two areas in which things were not great in the development/testing process, both surrounding webhooks:

  • Testing webhook processing
  • Webhooks from one test environment being sent to another environment

Handling webhooks from Stripe

Stripe can be configured to send events to your application via webhooks. In this way, you can maintain the internal state of your customers as they transition through the payment process. However, there is no way to know that the webhook request actually came from Stripe.

In order to verify the authenticity of the webhook payload, we need to fetch the event from Stripe’s API, ignoring the payload from the webhook (except the event ID, which we use to perform the lookup). Since we initiate the call to Stirpe’s API, and it is secured over HTTPS, we can trust that we are getting accurate data from Stripe. Once we have a valid event, we can do whatever processing we need to on it. This is the process suggested by Stripe:

If security is a concern, or if it’s important to confirm that Stripe sent the webhook, you should only use the ID sent in your webhook and should request the remaining details from the API directly. We also advise you to guard against replay-attacks by recording which events you receive, and never processing events twice.

(I hope all of Stripe’s customers would consider security to be ‘a concern’, since we are dealing with payment processing.) The second note about avoiding replay attacks is also worth noting, but it is relatively easy to take care of— just record each webhook payload in a database collection with a unique index, and check if the insert succeeded before proceeding.

Testing webhooks from Stripe

The problem with this approach for validating webhook data is that it makes integration testing difficult, because Stripe doesn’t send invoice webhook events right away:

If you have configured webhooks, the invoice will wait until one hour after the last webhook is successfully sent (or the last webhook times out after failing).

So, imagine a test script that does the following:

  • Sign up a user with a new plan
  • Update the user’s credit card to be one that will fail to charge (4000000000000341 is the test card number that Stripe provides for this purpose)
  • Change the trial end date to end in one second (other tests will ensure that the trial period works properly, but this is meant to test the renewal flow)
  • Wait 5 seconds, to be sure the trial has ended, and the webhook has been sent
  • Ensure that the account is put into the ‘charge failed’ mode

However, the ‘Wait 5 seconds’ step isn’t right, since we might need to wait up to an hour. This is way too long to wait to know if our tests pass. So, what else can we do? We can’t just fake the webhook event, since our code needs to fetch the event from Stripe to be sure it is authentic.

Disable authenticity check in test mode

The solution we settled on was to disable the authenticity check in test mode. Since testing just that we can fetch an event from Stripe isn’t a terribly interesting test (and presumably, it is covered by their SDK test suite), I’m comfortable with this deviation between the test and production flows. In the end, the test script listed above looks more like this (the test app has the ‘test mode’ flag enabled, which disables the event fetching):

  • Sign up a user with a new plan
  • Update the user’s credit card to be one that will fail to charge (4000000000000341 is the test card number that Stripe provides for this purpose)
  • Change the trial end date to end in one second (other tests will ensure that the trial period works properly, but this is meant to test the renewal flow)
  • Send a fake webhook event that looks like one Stripe would send for a failed invoice charge (be sure to use a unique event ID, so that it won’t trigger the replay attack detection code— test that code in a different test)
  • Ensure that the account is put into the ‘charge failed’ mode

How could this system be improved?

If I were to implement a webhook-sending service, I would include a header on the request including an HMAC value that could be used to verify that the request was coming from a trusted origin. The HMAC process is detailed in RFC 2104, but it can be summarized as:

1. The sender prepares the message to be send (the webhook payload, in this case).
2. The sender computes a signature using the message payload and a shared secret (this could be the Stripe secret key, or it could be separate secret used only for this purpose, as long as it is known to both Stripe and your application, and no one else).
3. The sender then sends the message along with the signature (usually in an HTTP header).
4. The receiver (ie, your application) takes the message and computes its own HMAC signature, using the shared secret.
5. The receiver compares the signature it computed with the one that was received, and if they match, the message is authentic.

Avoiding webhook confusion

The next problem we faced was dealing with renewal webhooks being sent to our staging server, referencing unknwon accounts. The problem can be summarized like this:

  • Stripe only supports two modes: Live and Test
  • We have many non-production systems: staging, dogfood, each developer’s local instance, etc
  • Webhooks are retried by Stripe until they succeed. If you have multiple webhooks configured, each one will be retried until it succeeds (so if you have three configured, and one succeeds while the others fail, the others will be retried).
  • In production, a failing webhook would be a problem that would require investigation—we don’t want something like that to fail silently.

So, if I am testing/developing the signup flow on my laptop, and my local app is configured with the ‘Test’ Stripe credentials, webhooks resulting from these interactions will be sent to our Staging server (since we have webhooks in the ‘Test’ mode configured to go there).

Staging will get the webhook payload, validate it, and then look at the account ID to do its work. Then, it will find that it doesn’t know about the referenced account, log an error message, and return a non-successful status to Stripe, indicating that it should retry. This is the desired workflow in production if processing a webhook fails in this way.

So, how do we avoid this noise in our alerting system? We don’t want to disregard all errors in staging; how would we catch issues before they get to production?

The answer we settled on is simple, but it still feels a little hacky: sign up for a second Stripe account. Don’t use the ‘Live’ mode in this account, and don’t configure any real bank info. Don’t keep any webhooks configured in there (but for testing ad hoc issues, you can add one, possibly using an ngrok url pointing to your local instance). Use this new dummy account for local developer configs, and use the real account’s ‘Test’ mode for staging.

How could this be better?

Considering how Stripe really covers all the bases in other areas, and provides an amazingly easy to use and powerful system, it is kind of surprising that they don’t have better support for customers who also have even moderately complex testing requirements. We recently implemented our own support for different environments, so we did some research into how other services solve this problem:

  • SendWithUs – they differentiate between production and non-production, but they allow you to create many non-production API keys, that behave differently. So my local test key can have all emails sent to my email (regardless of the original recipient), and Alexis’s key can have emails sent from his laptop go to his email, while John’s key can not send any real emails at all. All of these test emails will end up in the same bucket of ‘test’ emails in the logs on the SendWithUs console (so they aren’t siloed all the way through), but the analytics are not tracked on test emails, so they don’t interfere with your production metrics. All test & production keys can share the same templates, drip campaigns, etc. You can create as many keys as you like. I think this scheme works well for SendWithUs’s product, but it wouldn’t likely work for other products that need environments’ data to remain fully siloed.
  • NewRelic – You can create different ‘applications’ for each environment, and they remain fully siloed. There is no different between the same codebase running in production vs. staging and a completely different application.
  • Mailchimp – You can create different lists, and have staging subscribe people to the staging list, while production subscribes people to the production list. This is very similar to the NewRelic approach, but in a different domain.
  • LaunchDarkly – This blog post isn’t meant to go into depth about our environments feature, there are other places for that. But, we did something more like SendWithUs, where the environments can share common data, like goals or the basic existence of features, but allows you to have different rules for each environment, and keeps all data siloed.

So, that is how a few other companies have solved this problem, how could Stripe improve their solution? My first suggestion is to allow me to create as many environments as I need, and keep all data siloed. Alternatively, they could allow me to create groups of webhooks, such that only one in each group must succeed before considering it delivered. This would solve my problem right now, but it feels less flexible than multiple environments, and would likely not solve other people’s problems.



13 Aug 2015

5 surprising ways your neighbors use feature flags

Recently I handled a support request asking for clarification about what problems can be solved with a system like LaunchDarkly, and how it’s different from every other analytics/event-tracking service. I realized that we actually have a wide range of things that customers are doing with our service–many of them are not obvious use cases for feature flags, but it turns out they work pretty well. I’m trying to beware of seeing everything as a nail, but I think these are all good fits for feature flags.

LaunchDarkly allows you to control your software releases, by separating the deployment and rollout processes. Many people think of deployment and rollout as the same thing, but it is incredibly powerful if you can separate them. Deployment is actually pushing the code to your production servers, and rollout is exposing the new features to your users. With LaunchDarkly, you can:

  • Roll out a new feature to a subset of your users (like a group of users who opt-in to a beta tester group), gathering feedback and bug reports from real-world use cases.
  • Gradually roll out a feature to a percentage of users, and track the effect that the feature has on key metrics (for instance, how likely is a user to complete a purchase if they have feature A versus feature B?).
  • Turn off a feature that you realize is causing performance problems in production, without needing to re-deploy, or even restart the application with a changed configuration file.
  • Grant access to certain features based on user attributes, like payment plan (eg: users on the ‘gold’ plan get access to more features than users in the ‘silver’ plan).
  • Disable parts of your application to facilitate maintenance, without taking everything offline.

This is just a brief sample of some of the use cases that some of our customers are using right now. So, while tracking events is an important part of powering these use-cases (especially for A/B testing and canary launching), it is not the main focus.

Take a look at our blog and docs (I linked to a few relevant pages above, but there is more there), and of course, we’re happy to answer any other questions that you might have, or help you get started.



04 Mar 2015

How LaunchDarkly uses feature-flags for rolling maintenance modes

As developers, we try to avoid it, but from time to time, downtime is needed
to perform an upgrade or maintenance. Downtime is sometimes needed to upgrade the hardware running an application (or some part of it), change database engines, or to change the data model underlying your application (in cases where the old code can’t understand the new model, and the new code can’t deal with the old model). Other times, you might need to regenerate a derived data model (such as a search index). During these times, all or part of your application may need to be either taken offline, or put into a degraded mode (such as read-only mode).

At LaunchDarkly, we recently needed to reindex the user search feature for all of our customers. Since we have a pretty new product, we didn’t yet have a history of doing this in any particular way. What follows is a description of the problem we faced (and what similar maintenance problems might crop up), and how we solved it.

Types of maintenance modes

Depending on the task, you may need to use different types of maintenance:

  • Whole-site maintenance, where you need to take the entire site offline,
    perhaps to migrate data structures in your main datastore, or move the site
    from one datacenter to another. Of course, this is the most disruptive to
    your service, and is the least palatable.
  • Whole-feature maintenance, where you need to take a portion of your site down entirely, but other parts of the site can remain up. For example, you might need to reindex your search cluster for all content on your site—search will be down during this period, but the rest of the site can remain up. This is better than taking the whole site down, but it does affect all of your users at once.
  • Per-user feature maintenance, is when you can perform the maintenance such that it affects only one user at a time. To use the search reindexing example again, perhaps you can reindex just one user’s content at a time. Depending on your project, this may be the least disruptive type of maintenance, since only one user is (partially) down at any given time, and presumably that user is down for a much shorter time that it would take to perform the maintenance for all users. The drawback is that your application and the specific maintenance needs to be structured so that you can perform the task on a per-user basis. This won’t be true in all cases.

What are my options?

Depending on your stack, there might be some existing library that could help:

  • There may be a platform-specific option available for you. For Django, there is django-db-tools, which includes a middleware enabling read-only mode. You can think of this as a kind of hybrid whole-site maintenance/whole-feature maintenance, where the ‘feature’ is being able to modify the data, across the whole site. To enable it, you just set an environment variable, and restart the application. Rails has turnout, and in node.js, there is the maintenance package. Both of these work in similar fashions, intercepting all requests, and displaying a maintenance mode page (supporting the whole-site maintenance mode described above). You can also exclude certain paths or source IP addresses so you don’t have to disable the entire site, but you are still disabling entire pages at once.
  • You can also switch to maintenance mode at the proxy level, (Heroku offers this, or you can [set something like this up in your own infrastructure). The advantage with this is that there are no code changes necessary, so it might make sense if you need to perform maintenance on the box running your application (since you can take the entire application down). It also makes sense if you are on Heroku since it is so easy. However, this only supports whole-site maintenance, as it takes your entire site out of commission.
  • Build your own ‘maintenance mode’ flag into your application, in a configuration property, or database value, etc. I suspect this is the most common option. Lanyrd has a great post about how they put their application into read-only mode to migrate their database. They talk more about the meat of their migration (moving from MySQL to PostgreSQL), but they mention read-only mode as being a critical part of the overall plan. Building your own support for maintenance mode obviously gives you the most flexibility, but it also involves the most work.

How did we do it?

When we needed to reindex our user search feature, we went with a solution that was mostly a ‘build-your-own’ option, because it gave us the most flexibility to only degrade the site as much as was necessary, and no more. This meant that we used the per-user feature maintenance scheme, so that only one user was affected at a time, and that window was much shorter than if we needed to take the feature down for all users, while we reindexed them all.

We dogfood our product all the time, so it didn’t take long to realize that a maintenance mode flag is just a feature flag. (Hopefully this isn’t a case of Maslow’s Hammer, but in hindsight, I think it worked out very well.) First, let’s go over how to think of maintenance mode flags as feature flags (if you don’t use LaunchDarkly, you should still be able to build something similar in-house):

For the simplest type on maintenance, whole-site maintenance, you can check the flag in your top-level request handling code (or whatever is the main entry point in your application). If the flag is on, then display your maintenance page with a 503 status. If the flag is off, then you process the request as usual. (This will give you the same behavior as Turnout in Rails, or the Maintenance module in node.js.) If you are using LaunchDarkly, I suggest creating the feature in with a 100% rollout, but keeping the top-level kill switch turned off.

maintenance and rollout

When you need to turn on your maintenance mode, just use the kill switch to activate it, and perform whatever tasks you need to. If you aren’t using LaunchDarkly, you will need to enable maintenance mode in your environment variables or configuration file (for django-db-tools or turnout, respectively), and restart your application. If you are using node.js’s maintenance package, you can enable maintenance mode by hitting a special endpoint, so you don’t need to restart the application. If you are using a custom-built solution, then obviously the mechanism to turn on maintenance mode will vary.

For whole-feature maintenance, the process is essentially the same, except you want to put the feature flag check on the relevant feature in your code. I don’t think this is really possible to do using an out-of-the-box middleware solution like django-db-tools, turnout, or maintenance module, unless you can target the specific feature with a URL pattern. You also might want to have a more specific message displayed to users (eg, explaining that product search is down right now, but the rest of the site is still up). You might want to disable elements throughout your site (like a search box at the top of the page). I will leave it as an exercise to the reader to come up with the ideal user experience in this case.

This means that a custom-integrated solution is needed (either with or without LaunchDarkly). In this case, you would set up the feature in LaunchDarkly in the same way as for whole-site maintenance (with 100% rollout and the kill switch turned off).

For per-user feature maintenance, you will still want to build the feature flag checks into your application as with whole-feature maintenance, but you will need to set up the feature a little differently in LaunchDarkly (or build a more complex system for deciding if a user is in maintenance mode or not, if you are building your own system). For LaunchDarkly in this case, we want the default rollout to be 0%, since we will be explicitly adding users.

maintenance and rollout 2

Then, to put an individual user into maintenance mode, just add them to the include list on the ‘targeting’ tab:

maintenance and rollout 3

One of the big advantages is that you can toggle the maintenance mode flags without needing to restart the application (or really touch it at all).

Sounds tedious. How can I automate this? (or maintenance is like a monad… I mean like a burrito)

Turning on maintenance mode for one user at a time, performing the maintenance, rinsing and repeating is rather tedious. To help with this, we’ve created a simple shell script that will turn the maintenance mode on and perform some other task.

This script is meant to be the middle layer of a three-layer series of scripts:

– One script to iterate through all of your users, and call the next layer with each one. You need to write this part if you are doing a per-user feature maintenance. If you are not doing per-user feature maintenance, you don’t need this layer.
– This maintenance-mode-guard script.
– The script to do the actual work. If you are doing a per-user feature maintenance operation, then this script should take the user key as a command-line argument (the last argument, in fact). If you are not doing doing a per-user feature maintenance, the of course it doesn’t need to know which user it should operate on. You need to write this script, since it will be specific to your application.

For example, suppose you wanted to reindex all products in your application (so you are using a whole-feature maintenance scheme):

In this example, you just specify your LaunchDarkly API key and the feature flag key that guards the feature you are performing maintenance on (this can be either a whole-feature maintenance flag or a whole-site maintenance flag). The script will turn on the maintenance mode flag, call your maintenance script (/path/to/maintenance-function --param 1 --param2 etc in this example), and then turn off maintenance mode. If something goes wrong with the maintenance script (as detected by a non-zero return code), then a message will be printed to standard output, and maintenance mode will be left on. You can then investigate and correct the problem, and turn off maintenance mode yourself. (If something goes wrong when turning on maintenance mode, like an invalid API key or feature key, then a message will be printed, and your maintenance script will not be called.)

Next, let’s look at an example of a per-user feature maintenance operation:

There is a bit going on here, so let’s dissect it:

  • /path/to/user-enumeration-script |\ is meant to be a script (or command) that you write that will enumerate user keys, one per line. This output is then piped to the while loop.
  • while read user; do reads the user keys, one line at a time, executing the while block once per user. The user key is assigned to the $user variable.
  • The next line can be further dissected
    • maintenance-mode-guard.sh -a $LD_API_KEY -f 'user.feature.maintenance.mode.flag' -u $user passes your LaunchDarkly API key and the feature flag guarding the feature that you are performing maintenance on (as with the whole-feature maintenance example above), plus the user key that is being operated on in this iteration.
    • /path/to/user-maintenance-function --param 1 --param2 --user is the command that will be executed by the maintenance-mode-guard.sh script, except that the user key will be appended to the command arguments. In this example, if $user had a value of bob@example.com, then the command executed would be: /path/to/user-maintenance-function --param 1 --param2 --user 'bob@example.com'
  • if [ $? -ne 0 ]; then ... the rest of the block just checks the return code from the single-user maintenance operation, and halts if something went wrong. This way, the last user is still in maintenance mode, and you can investigate.

A word about TTL values

In general, you want to have non-zero TTL values on your features to avoid external network calls, especially for such critical paths like you would use for a whole-site maintenance flag. However, when you are in maintenance, especially if you are rapidly changing which users are in maintenance mode, you will probably want to pay the (still minor) cost of a network call to the LaunchDarkly server to get the latest version of your feature. I’d suggest the following:

  • Under normal operation (ie, non-maintenance windows) keep the TTL high, like 5 minutes.
  • 5 minutes before you are ready to start maintenance, log into LaunchDarkly and turn the TTL for the maintenance feature flag you will be using to 0. The important thing is to do this in advance of the maintenance, so all of your servers will be sure to have expired the long-TTL feature from their respective caches, and start making live network calls on every flag request, before you begin maintenance.


– After maintenance is complete, set the TTL back to 5.
– You can use a value even larger than 5 if you want.

LaunchDarkly helps you build better software faster with feature flags as a service. Start your free trial now.