14 Sep 2017

Beta Testing with Feature Toggles: Testing in Production Like a Pro

We all know beta testing is important—not just for understanding your customers’ needs, but also for stability and security. Every time you do a launch you are essentially asking: “Are there bugs? Is there feedback?” Both with the goal of making your product better.

Testing in production will give you the most information about the success of your new functionality. And because feature flags help separate deployment from release, they make such testing safe and easy. When it comes to beta testing, a lot of the top companies tend to adhere to a similar paradigm—test early, test often, and do it in your production environment.

So how do companies have smooth and simple transitions from alpha to beta testing, and then to full rollout? Read on to learn how top companies are approaching their beta testing using deployment tools with feature flags providing links out to more in-depth descriptions.

But before we get started, here’s a quick terminology review. Pete Hodgson refers to this use of feature flags for betas as “permissioning toggles.” Also known as a “canary launch,” this is often random like a percentage rollout. A set group, or “champagne brunch,” releases to internal users or another section or group.


6 Approaches to Product Launching

#1 Facebook is the prime example of dark launching. Their release management has to be impeccable to operate at such massive scale. Their betas are often up to  million users or more.

“Although we push to production only once a week, it’s still important to test the code early in real-world settings so that engineers can get quick feedback. We make mobile release candidates available every day for canary users, including 1 million or so Android beta testers.”

Read their article on Rapid Release at Massive Scale to learn more about how they do continuous delivery at scale.

#2 Hootsuite gives a typical rollout pattern for its features—starting internally and then slowly exposing to a larger audience.

Typical
Push new code then:
– Dark launch to yourself or your team to test
– Launch to the whole Hootsuite organization
– 10% of all users
– Watch graphs
– 50%
– 100%
– Simple means of rollback if necessary

Check out Bill Monkman’s full deck on dark launching here.

#3 Etsy calls feature flags “config flags,” and gives a lot of credit for their process to Flickr.

“Key system-level and business level metrics (like checkout/listing/registration/sign-in rates) are projected on screens in the office and we have a number of internal dashboards that the team uses (we mainly use Ganglia and Graphite). We also have lots of switches and knobs to help us roll features out to percentages of users and ramp them up slowly, or quickly. Features are used and tested by us here at Etsy for some period of time before they are rolled out publicly.”

They have custom built a feature flagging API, “Feature API” to enable this. Some of the bucketing they use include: admin, internal, users, groups.Read more about Etsy’s deployment practices and check out their Feature API on GitHub.

#4 Beta can also apply to back-end rollouts. Instagram does canary deployments to a subset of servers, using feature flags as a continuous delivery tool. It’s important for continuous delivery to perform these tests, which are key in helping them avoid failed deployments.

But Instagram hasn’t always had this system. Read here to learn how they evolved from a “mish-mash of manual steps and scripts” to a system they could depend on. And check this out if you want more recipes for database migration with feature flags.

#5 Niantic’s Pokemon Go betas are well known and rabidly tracked by its fans. They famously roll out by region—a field test in Japan here, a limited beta in Australia, and then something in New Zealand. Sometimes these betas for features are invite-only. Here’s a write up of how they approached the rollout of the game Ingress.

#6 GoPro released their GoPro Plus product early using feature flags. By breaking the larger release into smaller features with their own testing timelines, they were able to iterate and improve continuously. The video below walks through the technology they used and the timeline from dogfood to a “big bang” marketing announcement.

“At GoPro you can kind of tell we don’t things lightly. We want to do big announcements and we want to come out with great products…we actually had smaller features that would go out, and then go for alpha testing and beta testing along the way. Shortly after March, we actually had most of the applications done from a core feature standpoint, but we kept iterating and improving those core features that we knew we were going to launch with.”

 

Controlling Your Rollout Like a Boss

Did you notice some trends there? These larger companies are using beta testing to do one of the following:

  • Testing in production with feature flags
  • Ability to release early and test small functionalities before a broader release
  • Internal tests that easily become external canaries
  • Regional rollouts

As more companies start to use feature management, these incremental rollouts are not the headaches they once were. Companies can be safer and smarter with how and when they expose features to their end users.

If you want to get started with feature flagging, check out featureflags.io a resource we made for the community to learn best practices.  

19 Apr 2016

Why Leading Companies Dark Launch

LaunchDarkly Dark Launch

When it comes to releasing new features, there is nothing worse than deploying a feature that cripples your application, degrades performance, and turns away customers.

With the rise of continuous delivery, software teams are embracing faster, more iterative feature releases.  It’s now imperative for teams to ensure their features will be well-received by customers and maintain their application’s performance.

This is why companies like Google, Facebook, and Amazon have embraced dark launching to ensure the efficacy of their feature releases and the stability of their app infrastructure.

Continue reading “Why Leading Companies Dark Launch” »

07 Nov 2015

Feature Flag-Driven Development

This article provides a broad and comprehensive overview of feature flag driven development, from gradual rollouts to A/B testing.

A Typical Afternoon

It’s an uneventful Friday afternoon.  You’re ready to head home, hit the gym, and spend time with the family.  The last thing you’d want to do is deploy some new features to your users right?  That would be crazy talk.  “Let’s just wait for Monday.. it’s too risky.. what if things break?”  Deploying on Friday would inevitably mean working the weekend, but you’re brave, you’re the Zeus of the programming world, and you deploy it anyway.

In fact, you’re smiling and cranking up the music as you confidently stroll out of the office to the bus stop.  “Naive!” some would say, but you keep up the pace with a hop in your step.

A few hours later, you’re lounging on the couch, proudly sunk into the cushy leather.  “Bzz Bzz” as your phone rattles in your pocket.  You ignore it, but the Bzzing won’t stop.  With an irksome look, you begrudgingly grab your phone and see a slew of error messages from your tracker.  Oh no!  This is it.  The deploy failed.  The gamble has turned into a nightmare… Right?  Nope.

Without much thought and a timid roll of the eyes, you pull up LaunchDarkly and flip a switch.  The errors stop and your only annoyance is the 30 seconds missed of your show.

This is the power of feature flags.  On that Friday afternoon, you used LaunchDarkly to deploy a feature to 1% of your users, warmly wrapped in a feature flag.  You wanted to test to see how it performed.  If something bad happened, no problem.  You knew you could just flip the switch and the feature would be rolled back with only 1% of your users experiencing a few seconds of inconvenience.  This is just one of hundreds of use cases showcasing the power of feature flags.

Feature Flags Explained

Feature flags/toggles/controls are basically ways to control the full lifecycle of your features.  They allow you to manage components and compartmentalize risk.  You can do pretty cool things like roll out features to certain users, exclude groups from seeing a feature, A/B test, and much more.  Check out this video on canary launching to see the benefits of dark launching features.

How They Work

When a user loads a page, your application will use that user’s attributes to determine what features to show.  For example, if I am a BETA user and I log in to myexamplesite.com, I will see the brand new BETA feature.  However, all non-BETA users will just see the old feature.   The reason I see the BETA feature is that my user is grouped as BETA.

ld_overview2

While there are many open source solutions, we can dig deeper into LaunchDarkly’s SAAS platform for feature flags.

In this example below, you will see an explanation of multivariate feature flags.  Multivariate means that you can serve multiple variations of a feature to different user segments.  For example, let’s say I have a purple, orange, and green landing page.  I can select which individual users or groups of users I would like to see each variation.

LaunchDarkly Multivariate Feature Flags

On the LaunchDarkly side of things, you would do the following:

  1. Create a feature flag called “Landing Page”
  2. Name three variations “Purple” “Orange” “Green”
  3. Select which users you want receive each variation (you can also serve to percentages: ‘60% of all users get the purple variation’)

On your side of things, you would do the following:

  1. Add the LaunchDarkly SDK to your platform
  2. Wrap your feature in a feature flag
  3. Call the SDK method to receive the value for that flag

It’s as simple as that.   You can check out the full documentation here.

Targeting Users

Above, we briefly covered how to target individual users and groups.  Let’s take a deeper look into why this is important.  Targeting gives you the power to personalize a user’s experience.

Imagine the ability to create a customized and rewarding experience for every user.  Here are a few notable use cases:

  • Plan Management (normal vs. premium) –  You can launch targeted features to users on different plans.  Want to add a new feature for a premium user?  Sure!  Just wrap the new feature in a flag and turn it on for premium users.  Want to extend that feature to normal users eventually?  No problem!  Just add normal users when you’re ready.
  • Early Access – Allow only opt-in or power users to experience the latest features.
  • Block Users – Exclude users or groups who you do not want to see a new feature.
  • And many more.

Managing Rollouts

If you’re deploying a brand new set of features, launching them to 100% of your users at once is a risky business.  In fact, testing things by giving all of your users access isn’t really a test.  A test should be the process of receiving incremental feedback from your users, making improvements, and gradually expanding your release to everyone.  This is where LaunchDarkly’s rollouts come in.  If you want to launch a new feature, you can start by rolling it out to 1% of your users.. then 5%.. then 20%.. then 50%.. then 100%.  If something goes wrong at the 1% rollout, you can instantly roll it back, make the improvements, and test it again.

launchdarkly-rollout-2x

This is the process of canary launching, whereby you test the efficacy of a new set of features before releasing it to everyone.  It also allows you to test how your features behave at different levels of traffic and incrementally refine your infrastructure to support the deployment.

Flag Driven Development

Feature flags/toggles/controls harness the power of test-driven development (TDD).   This is the process of releasing and iterating features quickly, testing those features, and making improvements.  Think of it as Lean UX methodology.  You release light features to receive market feedback.  You iterate on that feedback, make improvements, and redeploy.

Think of feature flag-driven development as a way to receive iterative market feedback on your product, rather than solely depend on isolated customer feedback.  It’s a way to test how your features perform in the real world and not just in an artificial test environment.

Feature Flag Driven Development - Waterfall Agile

In the world of waterfall development, you will typically see one continuous build that culminates in a single deploy.  After this deploy, you’ll receive feedback and fix some bugs, but you will likely need to restart the process for any major feature releases.

Agile is a bit more forgiving.  You can iteratively test small releases to your users, but this is best performed in a staging environment.  You typically will not release your product to market throughout the agile development process, as most of your testing will be internal and controlled.

Finally, lean UX codifies the process of releasing features to market throughout the development process.  These releases will likely be smaller in scale, but you’ll receive immediate market feedback. When you introduce feature flags into the equation, the process becomes even more efficient.

Continuous Delivery via Feature Flag Driven Development

Feature flags allow you to substantially mitigate the risk of releasing immature functionality.  If a release is having a negative impact, roll it back.  If it’s doing well, keep rolling it out.   This is like having a persistent undo button and a means to recalibrate and improve functionality.

More importantly, you can institutionalize this process within your development cycle.  Your team will develop a cadence for lean releases, where all new components and functionality are wrapped in feature flags.  You can easily test features, cultivate creativity, and reward bold advances – all without compromising the integrity of your platform.

ld_ffdriven

This new development methodology also allows your marketing, design, and engineering teams to collaborate more frequently and more effectively.  With an agile approach, you will typically have one large planning cycle that will launch you into development.  You will then test your iterations on local groups or in a local environment that tries to simulate production.  However, you cannot substitute real market feedback.

Feature flag driven development allows you to quickly release iterations of your features to market, receive feedback, improve, and redeploy.  It allows you to roll out features to small segments of your users in order to mitigate risk all while receiving valuable feedback.  More importantly, your team will converge and collaborate based on real market feedback and make the necessary improvements to drive the product forward.

Feature Flag Driven A/B Testing

A/B testing is the practice of comparing different versions of a page to see which one performs better.  In the traditional sense, A/B testing has been used for mainly cosmetic changes.  These include layouts, element position, colors, and copy.   Typically, A/B testing is tied to a goal.  For example, you want to increase sign up conversions, so you use tools like Optimizely, Visual Web Optimizer, and Apptimize to test different layouts, buttons, and call to action language.  These tools work great, but what if you wanted to test backend-level functionality, completely new features, and sign-up flows?

This is where feature flag driven A/B testing comes into play.

LaunchDarkly Feature Flag A/B Testing

Because feature flags are implemented at the code level, you can control deep functional features and then target user segments.   For example, let’s say I want to test a new sign-up flow and welcome tutorial (see above). I can flag the new functional components so that only certain users will receive the new flow.

With a suite like LaunchDarkly, you can then analyze these feature tests using your Optimizely or New Relic goals.  This will allow you to see, for example, which sign up flow is generating better conversions or which check out flow is generating more revenue.

All in all, feature flag driven A/B testing enables companies to test robust functionality instead of just cosmetic changes.

 


LAUNCHDARKLY HELPS YOU BUILD BETTER SOFTWARE FASTER WITH FEATURE FLAGS AS A SERVICE. START YOUR FREE TRIAL NOW.
11 Aug 2015

Secret to Facebook’s Hacker Engineering Culture

Facebook’s engineering is legendary for its speed and execution. You too can be as quick and smart as Facebook, if you know their hacker engineering secret. Originally they lived by “Move Fast and Break Things”, which has now evolved with wisdom to “Move Fast With Stable Infra.” Speed is important, as is stability and providing a good experience to users.Facebook’s engineering Kent Beck wrote a great Facebook Note on how Facebook embraces reversibility to scale up. I highly recommend you read his entire post.

Facebook has a secret sauce: an in-house system called Gatekeeper that allows them to get quick feature feedback and quickly iterate based on feedback. Engineering changes are wrapped with a feature flag and pushed live to production. However, the features are live but off, then turned on via Gatekeeper to different users . Facebook’s seemingly simple system of separating deployment from rollout unlocks many powerful ways to move faster with more stability. All items in italics below are quotes from Kent Beck, followed by my analysis of how Facebook uses Gatekeeper.

Internal usage. Engineers can make a change, get feedback from thousands of employees using the change, and roll it back in an hour.

Initially, the engineer uses Gatekeeper to turn the feature on to internal users (only) . Interestingly, I’ve heard that Facebook is too large for changes to be effectively communicated EXCEPT by actually making the change. Instead of flurries of emails or blasts in chat rooms notifying other groups, Facebook engineers makes the code change and waits for impacted parties to notify them that something is broken, or fix their own dependencies. Separating changes from bigger releases with feature flags mean that any change can be rolled back at any time.

Staged rollout. We can begin deploying a change to a billion people and, if the metrics tank, take it back before problems affect most people using Facebook.

Staged rollout depends on feature flags to encapsulate a change and a feature flagging system (like Gatekeeper) to take it back.

Dynamic configuration. If an engineer has planned for it in the code, we can turn off an offending feature in production in seconds. Alternatively, we can dial features up and down in tiny increments (i.e. only 0.1% of people see the feature) to discover and avoid non-linear effects.

The key to turning features off in seconds (rather than hours or in best case, minutes) is “if the engineer has planned for it in the code”. By using feature flags to separate code deployment from functionality, Facebook can quickly kill malignant features. Without feature flags and Gatekeeper, Facebook would have to do a full redeployment.

Right hand side units. We can add a little bit of functionality to the website and turn it on and off in seconds, all without interfering with people’s primary interaction with NewsFeed.

Facebook smartly uses micro services and avoids monolithic code. Small changes in functionality, wrapped in feature flags, can quickly be toggled on and off using Gatekeeper.

Shadow production. We can experiment with new services under real load, from a tiny trickle to the whole flood, without affecting production.

Facebook pioneered dark launches, the ability to expose features to load without exposing them to users. I’ve heard that it’s impossible to simulate Facebook’s production load as it’s so large. Gatekeeper allows Facebook to control via feature flags load testing from user visibility.

Data-informed decisions. Data-informed decisions are inherently reversible. “We expect this feature to affect this metric. If it doesn’t, it’s gone.”

By wrapping a feature with a flag, it’s possible to isolate its effect on the system. Data-informed decision , tying an individual feature to metrics, is made possible by Gatekeeper and feature flags. Without feature flags, it’s impossible to see the impact of a change – if you release five features and twenty bug fixes at once, and engagement drops by 5%, what feature is to blame? Could one of the bug fixes actually have caused a 10% drop and one of the features a 15% gain? Only by separating out each change can true causation (not just correlation) be seen. Yammer also follows data-informed decision in its product development. Again, it’s necessary to have encapsulation of the feature to both have measurement as well as enable the rollback.

Advance countries. We can roll a feature out to a whole country, generate accurate feedback, and roll it back without affecting most of the people using Facebook.

Gatekeeper and feature flags, are enabling canary launches – using an entire country as “canary in a coal mine” to see if there are issues with a release. Rather than having a world-wide failure, Facebook can iterate quickly and rollback.

Soft launches. When we roll out a feature or application with a minimum of fanfare it can be pulled back with a minimum of public attention.

Facebook, after many misfires like Facebook Beacon, now follows Eric Ries (Don’t launch – separate out a marketing launch from a product launch). With feature flags, Facebook can get feedback from their own users, and control the story. Facebook has avoided the flameouts of Google, which has had epic failures with Google Wave, Google Buzz, and most recently Google Plus – all expensively launched, then expensively decommissioned. With feature flags and Gatekeeper, Facebook is always in control of who sees what when.

Want to be as smart as Facebook for developing software? Want to integrate reversibility, dark launches, data-informed decisions into your own development cycle? The smartest companies like Facebook, Medium, DropBox, and LinkedIn have in-house feature-flagging systems custom built for them. You can build your own system, or simply use LaunchDarkly, “Gatekeeper for everyone else”.


LAUNCHDARKLY HELPS YOU BUILD BETTER SOFTWARE FASTER WITH FEATURE FLAGS AS A SERVICE. START YOUR FREE TRIAL NOW.

 

09 Jul 2015

Canary release is the new beta

Are canary releases the new beta? What does beta even mean? Sean Murphy recently tweeted me:

Wow. Was Sean right? When I was an Engineering Manager at Vignette, I’d run beta programs for our new releases. The beta programs had a dual purpose. First, we wanted to get feedback on the stability and validity of our features. But the beta also fed marketing with happy reference customers for our launch announcement. Customers liked being part of a beta because it gave them early access to features they had been waiting for, as well as an opportunity to influence product direction.

What had changed? The word beta has been overloaded to mean “we’re not entirely ready for prime time, so please be patient”. Gmail was in beta for five years! At TripIt, we had a beta tag for multiple years.

Canary release – exposing features to some subset of users (whether it be opt-in, random rollout, or specific segments) is now used to describe what was once a beta.

  • Microsoft: In development of Windows 10, Microsoft used “canary” releases to test with internal users within Microsoft. Gabe Aul, who leads the Data & Fundamentals Team in the Operating Systems Group (OSG), said “our Canary ring probably sees 2X-3X as many builds as OSG because we catch problems in Canary and don’t push to OSG.”
  • Instagram: “Using ‘canary’ releases, updates go out to a subset of users at first, limiting the ability of buggy software to do damage.” Mike Krieger, Instagram co-founder and CTO, said he uses canary releases because “If stuff blows up it affects a very small percentage of people”.
  • Google: For Chrome, Google offers Chrome Canary, which it labels with “Get on the bleeding edge of the web, Google ChromeCanary has the newest of the new Chrome features. Be forewarned: it’s designed for developers and early adopters, and can sometimes break down completely.”

So yes, canary is the new beta.


LAUNCHDARKLY HELPS YOU BUILD BETTER SOFTWARE FASTER WITH FEATURE FLAGS AS A SERVICE. START YOUR FREE TRIAL NOW.