04 Dec

Why smart companies choose LaunchDarkly for feature flags as a service

Feature flags are a key part of successful continuous delivery that the most successful companies like Heroku and CircleCI use to supercharge their development. Feature flags power software teams to release features faster to the users they want. The biggest companies maintain their own feature flagging framework – but the smartest ones are using feature flags as a service, from LaunchDarkly.

Feature flags start off simple, but can quickly become a quagmire when you need to maintain them. At the beginning, it’s easy enough to create a feature flag as a database table that an engineer can manually use to turn a user on or off. Next, you’ll want to start doing more sophisticated rules like “20% of traffic,” “everyone on our beta email list” or “no one with a TechCrunch email address.” At this point, usually a rough UI is made so that non-technical people can also control. Then, the bright idea comes of using the same framework for a/b testing. This works, until it doesn’t. That’s when the smart companies switch to LaunchDarkly for their feature flags support, for ROI, reuse, maintenance and sophistication.

ROI Auction.com chose LaunchDarkly because they didn’t want to dedicate multiple engineers to build a feature flagging system from scratch. They wanted their engineers to work on differentiating features for Auction.com. Lawrence Yuan, Senior Director Web and Mobile Engineering at Auction.com, LLC says “LaunchDarkly has been easy to on-board and integrate into our existing development cycle, which has helped us move faster with less risk.”

Reuse One large LaunchDarkly customer had built a custom feature flagging structure for their one product. When they acquired another company, they found that they couldn’t reuse the functionality for any acquisitions, as it had been intertwined with their own functionality. Another customer recently rebuilt from Ruby to Node.js. When they rebuilt, their existing feature flagging infrastructure was obsolete. In both cases they found it prudent to move to a more stable system.

Maintenance costs CircleCI had built and maintained their own feature flagging system, as it was a critical part of the release process they used for canary releases and dark launch. However, they wanted to spend their time on continuous integration, not on their feature flagging system. Mark Pundsack, VP Product CircleCI, made the decision to move to LaunchDarkly, saying “I really like LaunchDarkly, and I’m glad we’re using it. I believe that feature flags is a critical piece of modern development, and LaunchDarkly brings this part of continuous delivery to the masses.”

Flexibility & sophistication LaunchDarkly facilitates both simple and complex feature flagging. We have both per feature views as well as per user views, so you can easily see who has what features. We allow you to see how long a feature flag is in use so you know when to remove it. Other developer tools companies use LaunchDarkly, saying “We like that you think about nothing but feature flags.”

The smartest companies are choosing to have LaunchDarkly handle their feature flagging. Are you ready to join them? Click here to start your free trial.

30 Sep

The Design Void of the Developer World

The challenge of effectively designing for continuous delivery.

Developers are expected to use increasingly powerful tools to deliver increasingly complex and innovative software. So, why do developer tools often feel like an IRS tax form?

I argue that it stems from a culture that treats aesthetics and functionality as exclusive, whereas I would argue they are highly intertwined. When software is consumer facing, we see a sweeping trend of minimalist elegance aimed at cultivating an emotional connection with the user. But, when that target user is now a developer, design seems to take a back seat to functionality.

“It just needs to work! Not look good!” is what I’ve heard in the past. This is the notion that accurately performing a task correlates to function; that putting in the correct inputs and receiving a reasonable output is the benchmark of success.

But shouldn’t there be other indicators of success? What about time saved? What about ease of use? What about scalability and reliability? What about the ability to make informed decisions? I can show someone a spreadsheet of tabular data or I could show them some informative charts. Both present the same data. Both functionally deliver. But one works much better at delivering the information and fulfilling its purpose.

Let’s look at continuous delivery: the art of releasing better software in shorter cycles. It’s the amalgam of time management, planning, and iterating. It would make sense that developing a product for continuous delivery would be cognizant of this interplay.

At LaunchDarkly, we could have taken two approaches with our continuous delivery tool.

  1. Functional Model: “Let’s just make the product work.”
  2. Empowerment Model: “Let’s make sure our product saves time, mitigates aggravation, and allows for better decision making.”

Let’s assume that both models have equivalent functionality and backend logic. Truly harnessing the entire functional suite requires intuitive access. This is where design comes in. Design is about access and empowerment. It is about empowering the user through access to features, knowledge, and informed decision-making.

Developer tools do not have to be barren and sparse. They can be fun, interactive, and powerful without sacrificing efficiency. While designing for developers is different than designing for consumers, those differences manifest in information delivery and should not sacrifice empowerment.

17 Aug

Best practices for testing Stripe webhook event processing

While writing the code to integrate our application with Stripe, I was very impressed with the level of polish that Stripe has put on their API, in the documentation, the language-specific SDK ergonomics, and how easy they make it to integrate with some so obviously complex as payment processing.

However, there were two areas in which things were not great in the development/testing process, both surrounding webhooks:

  • Testing webhook processing
  • Webhooks from one test environment being sent to another environment

Handling webhooks from Stripe

Stripe can be configured to send events to your application via webhooks. In this way, you can maintain the internal state of your customers as they transition through the payment process. However, there is no way to know that the webhook request actually came from Stripe.

In order to verify the authenticity of the webhook payload, we need to fetch the event from Stripe’s API, ignoring the payload from the webhook (except the event ID, which we use to perform the lookup). Since we initiate the call to Stirpe’s API, and it is secured over HTTPS, we can trust that we are getting accurate data from Stripe. Once we have a valid event, we can do whatever processing we need to on it. This is the process suggested by Stripe:

If security is a concern, or if it’s important to confirm that Stripe sent the webhook, you should only use the ID sent in your webhook and should request the remaining details from the API directly. We also advise you to guard against replay-attacks by recording which events you receive, and never processing events twice.

(I hope all of Stripe’s customers would consider security to be ‘a concern’, since we are dealing with payment processing.) The second note about avoiding replay attacks is also worth noting, but it is relatively easy to take care of— just record each webhook payload in a database collection with a unique index, and check if the insert succeeded before proceeding.

Testing webhooks from Stripe

The problem with this approach for validating webhook data is that it makes integration testing difficult, because Stripe doesn’t send invoice webhook events right away:

If you have configured webhooks, the invoice will wait until one hour after the last webhook is successfully sent (or the last webhook times out after failing).

So, imagine a test script that does the following:

  • Sign up a user with a new plan
  • Update the user’s credit card to be one that will fail to charge (4000000000000341 is the test card number that Stripe provides for this purpose)
  • Change the trial end date to end in one second (other tests will ensure that the trial period works properly, but this is meant to test the renewal flow)
  • Wait 5 seconds, to be sure the trial has ended, and the webhook has been sent
  • Ensure that the account is put into the ‘charge failed’ mode

However, the ‘Wait 5 seconds’ step isn’t right, since we might need to wait up to an hour. This is way too long to wait to know if our tests pass. So, what else can we do? We can’t just fake the webhook event, since our code needs to fetch the event from Stripe to be sure it is authentic.

Disable authenticity check in test mode

The solution we settled on was to disable the authenticity check in test mode. Since testing just that we can fetch an event from Stripe isn’t a terribly interesting test (and presumably, it is covered by their SDK test suite), I’m comfortable with this deviation between the test and production flows. In the end, the test script listed above looks more like this (the test app has the ‘test mode’ flag enabled, which disables the event fetching):

  • Sign up a user with a new plan
  • Update the user’s credit card to be one that will fail to charge (4000000000000341 is the test card number that Stripe provides for this purpose)
  • Change the trial end date to end in one second (other tests will ensure that the trial period works properly, but this is meant to test the renewal flow)
  • Send a fake webhook event that looks like one Stripe would send for a failed invoice charge (be sure to use a unique event ID, so that it won’t trigger the replay attack detection code— test that code in a different test)
  • Ensure that the account is put into the ‘charge failed’ mode

How could this system be improved?

If I were to implement a webhook-sending service, I would include a header on the request including an HMAC value that could be used to verify that the request was coming from a trusted origin. The HMAC process is detailed in RFC 2104, but it can be summarized as:

1. The sender prepares the message to be send (the webhook payload, in this case).
2. The sender computes a signature using the message payload and a shared secret (this could be the Stripe secret key, or it could be separate secret used only for this purpose, as long as it is known to both Stripe and your application, and no one else).
3. The sender then sends the message along with the signature (usually in an HTTP header).
4. The receiver (ie, your application) takes the message and computes its own HMAC signature, using the shared secret.
5. The receiver compares the signature it computed with the one that was received, and if they match, the message is authentic.

Avoiding webhook confusion

The next problem we faced was dealing with renewal webhooks being sent to our staging server, referencing unknwon accounts. The problem can be summarized like this:

  • Stripe only supports two modes: Live and Test
  • We have many non-production systems: staging, dogfood, each developer’s local instance, etc
  • Webhooks are retried by Stripe until they succeed. If you have multiple webhooks configured, each one will be retried until it succeeds (so if you have three configured, and one succeeds while the others fail, the others will be retried).
  • In production, a failing webhook would be a problem that would require investigation—we don’t want something like that to fail silently.

So, if I am testing/developing the signup flow on my laptop, and my local app is configured with the ‘Test’ Stripe credentials, webhooks resulting from these interactions will be sent to our Staging server (since we have webhooks in the ‘Test’ mode configured to go there).

Staging will get the webhook payload, validate it, and then look at the account ID to do its work. Then, it will find that it doesn’t know about the referenced account, log an error message, and return a non-successful status to Stripe, indicating that it should retry. This is the desired workflow in production if processing a webhook fails in this way.

So, how do we avoid this noise in our alerting system? We don’t want to disregard all errors in staging; how would we catch issues before they get to production?

The answer we settled on is simple, but it still feels a little hacky: sign up for a second Stripe account. Don’t use the ‘Live’ mode in this account, and don’t configure any real bank info. Don’t keep any webhooks configured in there (but for testing ad hoc issues, you can add one, possibly using an ngrok url pointing to your local instance). Use this new dummy account for local developer configs, and use the real account’s ‘Test’ mode for staging.

How could this be better?

Considering how Stripe really covers all the bases in other areas, and provides an amazingly easy to use and powerful system, it is kind of surprising that they don’t have better support for customers who also have even moderately complex testing requirements. We recently implemented our own support for different environments, so we did some research into how other services solve this problem:

  • SendWithUs – they differentiate between production and non-production, but they allow you to create many non-production API keys, that behave differently. So my local test key can have all emails sent to my email (regardless of the original recipient), and Alexis’s key can have emails sent from his laptop go to his email, while John’s key can not send any real emails at all. All of these test emails will end up in the same bucket of ‘test’ emails in the logs on the SendWithUs console (so they aren’t siloed all the way through), but the analytics are not tracked on test emails, so they don’t interfere with your production metrics. All test & production keys can share the same templates, drip campaigns, etc. You can create as many keys as you like. I think this scheme works well for SendWithUs’s product, but it wouldn’t likely work for other products that need environments’ data to remain fully siloed.
  • NewRelic – You can create different ‘applications’ for each environment, and they remain fully siloed. There is no different between the same codebase running in production vs. staging and a completely different application.
  • Mailchimp – You can create different lists, and have staging subscribe people to the staging list, while production subscribes people to the production list. This is very similar to the NewRelic approach, but in a different domain.
  • LaunchDarkly – This blog post isn’t meant to go into depth about our environments feature, there are other places for that. But, we did something more like SendWithUs, where the environments can share common data, like goals or the basic existence of features, but allows you to have different rules for each environment, and keeps all data siloed.

So, that is how a few other companies have solved this problem, how could Stripe improve their solution? My first suggestion is to allow me to create as many environments as I need, and keep all data siloed. Alternatively, they could allow me to create groups of webhooks, such that only one in each group must succeed before considering it delivered. This would solve my problem right now, but it feels less flexible than multiple environments, and would likely not solve other people’s problems.



03 Aug

Launched: Environments

LaunchDarkly now supports multiple environments!

Environments let you manage your feature flags throughout your entire continuous delivery pipeline, from local development to QA, staging and production. When you create your LaunchDarkly account, we’ll provide you with two environments, called Test and Production. Out of the box, you can use Test to set up feature flags for your non-production environments, while keeping your rules and data separate from your Production environment.

It’s incredibly easy to switch environments— just select your environment from the dropdown on the sidebar. When you create a flag, you’ll get a copy of that flag in every environment. Each flag can have different targeting and rollout rules for each environment. So you can roll a flag out to 100% of your traffic in staging, while keeping it ‘off’ in production.


Each environment has its own API key— use your Test key in your SDK for local development, staging, and QA, and reserve your Production API key for your production environment.

Environments are also completely customizable— you can create new environments, rename them, or delete them. You can even change the sidebar swatch color for your environment— make your Production environment red, for example, to give yourself a visual reminder that you’re modifying the rules for your live customer base.


Check out our documentation to learn more!



21 Jul

Golang pearl: Thread-safe writes and double-checked locking in Go

Channels should be your first choice for structuring concurrent programs in Go. Remember the Go concurrency credo– “do not communicate by sharing memory; instead, share memory by communicating.” That said, sometimes you just need to roll up your sleeves and share some memory. Lock-based concurrency is pretty old-school stuff, and battle-hardened Java veterans switching to Go will undoubtedly feel nostalgic reading this. Still, many brand-new Go converts probably haven’t encountered low-level concurrency primitives before. So let’s sit down and program like it’s 1999.

To start, let’s set up a simple lazy initialization problem. Imagine that we have a resource that is expensive to construct- it’s read often but only written once. Our first attempt at lazy initialization will be completely broken:

This doesn’t work, as both goroutines in Broken may race in GetInstance. There are many incorrect (or semantically correct but inefficient) solutions to this problem, but let’s focus on two approaches that work. Here’s one using read/write locks:

If you’re a Java developer you might recognize this as a safe approach to double-checked locking. In Java, the volatile keyword is typically used on instance instead of using a read/write lock, but since Go does not have a volatile keyword (there is sync.atomic, and we’ll get to that) we’ve gone with a read lock.

The reason for the additional synchronization around the first read is the same in Go as it is in Java. The go memory model does not otherwise guarantee that the initialization of instance is visible to other threads unless there is a happens-before relation that makes the write visible. The read lock ensures this.

Now back to sync.atomic. Among other things, the sync.atomic package provides utilities for atomically visible writes. We can use this to achieve the same effect as the volatile keyword in Java, and eliminate the read/write lock. The cost is one of readability– we have to change instance to an unsafe.Pointer to make this work, which is aesthetically displeasing. But hey, it’s Go– we’re not here for aesthetics (I’m looking at you interface{}):

Astute Gophers might recognize that we’ve re-derived a utility in the sync package called Once. Once encapsulates all of the locking logic for us so we can simply write:

Lazy initialization is a fairly basic pattern, but once we understand how it works, we can build safe variations like a resettable Once. Remember though– this is all last-resort stuff. Prefer channels to using any of these low-level synchronization primitives.

25 Mar

Golang pearl: It’s dangerous to go alone!


In Go, the panic function behaves somewhat like an unrecoverable exception: panic propagates up the call stack until it reaches the topmost function in the current goroutine, at which point the program crashes.

This is reasonable behavior in some environments, but programs that are structured as asynchronous handler functions (like daemons and servers) need to continue processing requests even if individual handlers panic. This is what recover is for, and if you inspect the source you’ll see that Go’s built-in HTTP server package recovers from panics for you, meaning that bugs in your handler code will never take down your entire HTTP server.

Unless, of course, your handler code spawns a goroutine that panics. Then your server is screwed. Let’s demonstrate with a trivial example:

This server will happily chug along, panicking every time the root resource is hit, but never crashing.

But if hello panics in a goroutine, the entire server goes down:

In our web services, we can never really trust a naked use of go. We wrap all our goroutine creation in a utility function we call GoSafely:

Here’s how we use it:

Not as syntactically sweet as a naked go routine, but it does the trick. The unfortunate thing (which we don’t really have a solution for) is that any third-party code that spawns a go routine could potentially panic– and we have no way of protecting ourselves from that.

LaunchDarkly helps you build better software faster with feature flags as a service. Start your free trial now.