18 Oct 2017

How We Beta Test at LaunchDarkly

Photo by Alex Holyoake on Unsplash

We recently looked at how some well-known companies beta test. Specifically we looked at groups that test in production, and do it well. As you know, testing in production is one of the best ways to find bugs and get solid feedback from your users. While some may shy away from this because of the risks involved, there are ways to mitigate risk and do it right. So this time we want to share how we beta at LaunchDarkly.

It’s no surprise that we dogfood at LaunchDarkly. Using feature flags within our development cycle is a straightforward process. We often push features directly into our production environment and safely test prior to allowing user access. When it’s time to beta test with users we can update the setting on the appropriate flag and get user feedback quickly. And of course, if we ever need to we can instantly turn features off.

Deciding Which Things to Build

When we’re thinking about new features to implement, we have our own ideas of which direction our product should go, but we also consider inbound requests. This can be from support tickets, questions from potential customers, or conversations with existing customers. Bottom line is we want to build a product that serves our customers, and so we do our best to listen to what they want.

Once we identify a feature we’d like to build—whether it was our own idea or a customer request—we’ll share it out to see if other customers are also interested. This is an important part of our beta testing process, because once the feature is in production and we’re ready to test it, these are the people we want to circle back with for beta testing.

Testing in Production

When it’s time to test, we test with actual end users in production. Our feature management platform allows us to turn features on for specific users. We can specify individual users, or we can expose users by attribute, like region (everyone in Denver)—and we can instantly turn them off at any time.

Because we’re testing in production, we don’t have to have an isolated environment or separate account. For those customers who showed interest, and agreed to participate in beta testing, we turn the features on in their production accounts.

Typically we beta for two weeks, sometimes as long as a month. As mentioned before, since we know which customers are interested in the feature, we can go back to them and have them test it. These are the users who already know they want this functionality, so we want to be sure it fits (or exceeds!) their expectations. And of course we want to make the most of this time, so it’s important we actually get feedback. We find that those who have asked for the feature are eager to let us know how things are working. We make a point of also following up with those who don’t proactively offer feedback—we want to hear from everyone!

While we’re testing and getting feedback, we’re taking all this information in and improving the feature before rolling it out to everyone else. When we feel confident we have something that’s ready to be shared, we’ll begin a percentage rollout to the rest of our users.

Embracing Failure

Using feature flags around features within our development cycles allows us to mitigate risk by pushing out small, incremental changes at a time. As you can see, this also enables us to beta test quickly and safely. If there are major bugs, we’re more likely to identify them early on before affecting our all of our customers.

“Embrace failure. Chaos and failure are your friends. The issue is not if you will fail, it is when you will fail, and whether you will notice.” -Charity Majors

Right now we’re currently in beta for scoped access tokens and a new faster .net SDK. Let us know if you’d like to take a look at it early, we’d love to hear what you think.

04 Jan 2017

How Feature Flagging Helps Usability Tests

Usability testing in a real-world environment (aka production environment) gives us insight into how users actually use our product in their day to day lives. It is one thing to run a test in a lab setting, but it is another to have users try features while they are walking, running to the airport, stressed, and sleepy.

But, no matter how hard we try, it is very difficult to truly simulate how our apps behave in our production environment. We can run focus groups, beta tests on a beta environment, and test things internally, but how can we truly mimic the real world in an artificial context? In other words, how do we test a feature while also simulating the environment it’s meant to be used in?

LaunchDarkly Usability Testing In Production - Feature Flags and Feature Toggles - Context

A good real-world usability test analyzes features efficiently and accurately collects user feedback to improve the user experience. But, when we test anything in a non-production environment, we are inherently biasing our tests. In a lab-based usability test:

  1. Users are overly cognizant of the feature they are testing.
  2. Users are unnaturally focused on testing that new feature.
  3. Users are using fake data or incomplete data, and don’t sufficiently utilize actual use cases.
  4. It is very hard to test discoverability (i.e can a user find the feature on their own?).
  5. Users tend to ignore distractions, like external notifications (Facebook, Skype, texts) and just focus on the task at hand.
  6. Users are in an overly analytical mode, typically looking to give feedback.
  7. Users are overly forgiving, looking to please!

Does this mean that running usability tests in non-production environments is a waste of time? Not at all! This is absolutely necessary to identify bugs, check for general usability, and solicit feedback quickly.

However, it should just be one of the steps in a comprehensive usability testing process, one that involves internal, staging, and production usability testing.

Benefits of usability testing in production:

The primary purpose of usability testing in production is to gather real-world user behavior while minimizing bias and performance risk. Some more benefits include:

  1. Genuine user feedback in a real-world environment
    • Quantitative insight into your feature’s performance. How well is it scaling? Are users using it? How is it impacting your system? How are your levels of engagement?
    • Contextual insight into your feature’s efficacy. Do users see the new feature? How is it meshing within the context on your existing feature set? Are people using it as intended? Are people using it once and then not using it again?
    • Qualitative feedback. Are users complaining? Are they happy? Are they neutral?
  2. No opt-in bias – users test the feature without knowing they are part of the test. You can assess how well they use it by using a product like Full Story to record the session or by tracking metrics. You therefore get a more representative sample testing the feature, rather than just early adopters.
  3. Measuring actual system performance – there is nothing quite like your production environment, where you can have a complex array of nodes, clusters, CDNs, etc allowing your app to scale. As people start to use the new feature, you can see how it is impacting your actual system (load times, discoverability, caching issues).

Managing a usability test in production:

Of course, testing anything in your production environment is inherently risky and has real-world consequences. If you launch something to everyone just to get feedback and they hate it, then you risk permanently losing those users. Equally bad, you can cripple your entire application with unforeseen scaling and performance issues.

To mitigate this, companies like Facebook, Amazon, and Google collect production feedback by releasing features behind feature flags. While we won’t go into to the specific anatomy of a feature flag, we can go through the methodology behind the release.

LaunchDarkly - Usability Testing Using Feature Flags and Toggles - Betas and Feedback

If a feature is wrapped in a feature flag, then it gives you control over who sees the feature and when. This means that you can perform targeted and controlled releases using a percentage rollout, whereby you can incrementally increase a feature’s visibility to 1%, 5%, and to 100% of your users.

Hence, you can collect production level feedback because you control the level of risk. If a new feature is performing well, then you can keep increasing the percentage rollout. If it is tanking or hurting performance, you can reduce the rollout or kill it completely.

Therefore, feature flags (aka feature toggles) give you full control over the risk of your production releases. You can gather real-world user feedback by separating your feature rollout from your code deployment.