07 Nov 2015

Feature Flag-Driven Development

This article provides a broad and comprehensive overview of feature flag driven development, from gradual rollouts to A/B testing.

A Typical Afternoon

It’s an uneventful Friday afternoon.  You’re ready to head home, hit the gym, and spend time with the family.  The last thing you’d want to do is deploy some new features to your users right?  That would be crazy talk.  “Let’s just wait for Monday.. it’s too risky.. what if things break?”  Deploying on Friday would inevitably mean working the weekend, but you’re brave, you’re the Zeus of the programming world, and you deploy it anyway.

In fact, you’re smiling and cranking up the music as you confidently stroll out of the office to the bus stop.  “Naive!” some would say, but you keep up the pace with a hop in your step.

A few hours later, you’re lounging on the couch, proudly sunk into the cushy leather.  “Bzz Bzz” as your phone rattles in your pocket.  You ignore it, but the Bzzing won’t stop.  With an irksome look, you begrudgingly grab your phone and see a slew of error messages from your tracker.  Oh no!  This is it.  The deploy failed.  The gamble has turned into a nightmare… Right?  Nope.

Without much thought and a timid roll of the eyes, you pull up LaunchDarkly and flip a switch.  The errors stop and your only annoyance is the 30 seconds missed of your show.

This is the power of feature flags.  On that Friday afternoon, you used LaunchDarkly to deploy a feature to 1% of your users, warmly wrapped in a feature flag.  You wanted to test to see how it performed.  If something bad happened, no problem.  You knew you could just flip the switch and the feature would be rolled back with only 1% of your users experiencing a few seconds of inconvenience.  This is just one of hundreds of use cases showcasing the power of feature flags.

Feature Flags Explained

Feature flags/toggles/controls are basically ways to control the full lifecycle of your features.  They allow you to manage components and compartmentalize risk.  You can do pretty cool things like roll out features to certain users, exclude groups from seeing a feature, A/B test, and much more.  Check out this video on canary launching to see the benefits of dark launching features.

How They Work

When a user loads a page, your application will use that user’s attributes to determine what features to show.  For example, if I am a BETA user and I log in to myexamplesite.com, I will see the brand new BETA feature.  However, all non-BETA users will just see the old feature.   The reason I see the BETA feature is that my user is grouped as BETA.

ld_overview2

While there are many open source solutions, we can dig deeper into LaunchDarkly’s SAAS platform for feature flags.

In this example below, you will see an explanation of multivariate feature flags.  Multivariate means that you can serve multiple variations of a feature to different user segments.  For example, let’s say I have a purple, orange, and green landing page.  I can select which individual users or groups of users I would like to see each variation.

LaunchDarkly Multivariate Feature Flags

On the LaunchDarkly side of things, you would do the following:

  1. Create a feature flag called “Landing Page”
  2. Name three variations “Purple” “Orange” “Green”
  3. Select which users you want receive each variation (you can also serve to percentages: ‘60% of all users get the purple variation’)

On your side of things, you would do the following:

  1. Add the LaunchDarkly SDK to your platform
  2. Wrap your feature in a feature flag
  3. Call the SDK method to receive the value for that flag

It’s as simple as that.   You can check out the full documentation here.

Targeting Users

Above, we briefly covered how to target individual users and groups.  Let’s take a deeper look into why this is important.  Targeting gives you the power to personalize a user’s experience.

Imagine the ability to create a customized and rewarding experience for every user.  Here are a few notable use cases:

  • Plan Management (normal vs. premium) –  You can launch targeted features to users on different plans.  Want to add a new feature for a premium user?  Sure!  Just wrap the new feature in a flag and turn it on for premium users.  Want to extend that feature to normal users eventually?  No problem!  Just add normal users when you’re ready.
  • Early Access – Allow only opt-in or power users to experience the latest features.
  • Block Users – Exclude users or groups who you do not want to see a new feature.
  • And many more.

Managing Rollouts

If you’re deploying a brand new set of features, launching them to 100% of your users at once is a risky business.  In fact, testing things by giving all of your users access isn’t really a test.  A test should be the process of receiving incremental feedback from your users, making improvements, and gradually expanding your release to everyone.  This is where LaunchDarkly’s rollouts come in.  If you want to launch a new feature, you can start by rolling it out to 1% of your users.. then 5%.. then 20%.. then 50%.. then 100%.  If something goes wrong at the 1% rollout, you can instantly roll it back, make the improvements, and test it again.

launchdarkly-rollout-2x

This is the process of canary launching, whereby you test the efficacy of a new set of features before releasing it to everyone.  It also allows you to test how your features behave at different levels of traffic and incrementally refine your infrastructure to support the deployment.

Flag Driven Development

Feature flags/toggles/controls harness the power of test-driven development (TDD).   This is the process of releasing and iterating features quickly, testing those features, and making improvements.  Think of it as Lean UX methodology.  You release light features to receive market feedback.  You iterate on that feedback, make improvements, and redeploy.

Think of feature flag-driven development as a way to receive iterative market feedback on your product, rather than solely depend on isolated customer feedback.  It’s a way to test how your features perform in the real world and not just in an artificial test environment.

Feature Flag Driven Development - Waterfall Agile

In the world of waterfall development, you will typically see one continuous build that culminates in a single deploy.  After this deploy, you’ll receive feedback and fix some bugs, but you will likely need to restart the process for any major feature releases.

Agile is a bit more forgiving.  You can iteratively test small releases to your users, but this is best performed in a staging environment.  You typically will not release your product to market throughout the agile development process, as most of your testing will be internal and controlled.

Finally, lean UX codifies the process of releasing features to market throughout the development process.  These releases will likely be smaller in scale, but you’ll receive immediate market feedback. When you introduce feature flags into the equation, the process becomes even more efficient.

Continuous Delivery via Feature Flag Driven Development

Feature flags allow you to substantially mitigate the risk of releasing immature functionality.  If a release is having a negative impact, roll it back.  If it’s doing well, keep rolling it out.   This is like having a persistent undo button and a means to recalibrate and improve functionality.

More importantly, you can institutionalize this process within your development cycle.  Your team will develop a cadence for lean releases, where all new components and functionality are wrapped in feature flags.  You can easily test features, cultivate creativity, and reward bold advances – all without compromising the integrity of your platform.

ld_ffdriven

This new development methodology also allows your marketing, design, and engineering teams to collaborate more frequently and more effectively.  With an agile approach, you will typically have one large planning cycle that will launch you into development.  You will then test your iterations on local groups or in a local environment that tries to simulate production.  However, you cannot substitute real market feedback.

Feature flag driven development allows you to quickly release iterations of your features to market, receive feedback, improve, and redeploy.  It allows you to roll out features to small segments of your users in order to mitigate risk all while receiving valuable feedback.  More importantly, your team will converge and collaborate based on real market feedback and make the necessary improvements to drive the product forward.

Feature Flag Driven A/B Testing

A/B testing is the practice of comparing different versions of a page to see which one performs better.  In the traditional sense, A/B testing has been used for mainly cosmetic changes.  These include layouts, element position, colors, and copy.   Typically, A/B testing is tied to a goal.  For example, you want to increase sign up conversions, so you use tools like Optimizely, Visual Web Optimizer, and Apptimize to test different layouts, buttons, and call to action language.  These tools work great, but what if you wanted to test backend-level functionality, completely new features, and sign-up flows?

This is where feature flag driven A/B testing comes into play.

LaunchDarkly Feature Flag A/B Testing

Because feature flags are implemented at the code level, you can control deep functional features and then target user segments.   For example, let’s say I want to test a new sign-up flow and welcome tutorial (see above). I can flag the new functional components so that only certain users will receive the new flow.

With a suite like LaunchDarkly, you can then analyze these feature tests using your Optimizely or New Relic goals.  This will allow you to see, for example, which sign up flow is generating better conversions or which check out flow is generating more revenue.

All in all, feature flag driven A/B testing enables companies to test robust functionality instead of just cosmetic changes.

 


LAUNCHDARKLY HELPS YOU BUILD BETTER SOFTWARE FASTER WITH FEATURE FLAGS AS A SERVICE. START YOUR FREE TRIAL NOW.
23 Jul 2015

Hypothesis Driven Development: Yammer case study

How Yammer does hypothesis driven development, guest post by Ron Blanford, Yammer Product Manager

Recently I kicked off a project to overhaul to our iPhone publisher in order to make it easier for users to post photos to Yammer. We didn’t start this project with the intention of overhauling the entire publisher, but when we took a closer look at the overall experience, we knew we needed to make big changes.

We still maintain a lean startup mentality at Yammer, which means we develop a hypothesis and build the most minimal thing we can to test that hypothesis and validate our decisions with data incrementally. As you might imagine, overhauls that change many variables at once are not too common around here but sometimes we know they are necessary to drive the product forward. According to Mary Meeker’s Internet Trends 2014, 1.8 billion photos were being posted to social media sites on a daily basis. So we generally know that people are accustomed to taking pics from their phone and posting them to social media sites. People have pictures on their phone. We just weren’t making it easy for them to post those photos to Yammer.

Go Big or Go Home

Why was it necessary to overhaul the publisher? I didn’t need an analyst or a user researcher to tell me that the experience of posting photos to Yammer was terribly outdated. Just using the feature made it obvious that we hadn’t invested in this part of the app in years. You’d tap the camera icon, which would then prompt you to choose to take a new photo or upload an existing one, at which point you’d get dropped into your photo roll or the camera. If you wanted to post multiple photos, you’d have to go through the flow again, and again, and again. As a general rule, we want to minimize the opportunity for bad experiences in the product, but we’d also been hearing from our customers through our user researchers that they were having difficulty with the photo posting process. This is especially true for retailers, for example, who employ thousands of workers who don’t sit in front of a computer every day. These users rely primarily on the mobile experience to communicate with their coworkers, and sharing photos is an important use case.

Hypothesis

My hypothesis for this project was that if we made it easier to post images, people would indeed post more images, and as a result, the number of days our users engage with Yammer would go up. Why? We know that posts with images are more engaging than those without. Posts with a photo get on average 17% more responses and nearly four times as many likes. Why are replies and likes important? Likes provide validation, acknowledgement, and support from the network. They encourage people to post more, which in turn encourage more replies, likes and eyeballs. It’s a nice reciprocal engagement loop that ultimately leads to more content on the network, more people having conversations, more people getting work done, more people discovering things, etc.

IMG_8118

Build It

This was the easy part for me since the vast majority of my heavy lifting was done prior to any developers writing a line of code. From here on out, our publisher was mostly in the hands of the designer and developers.

What made this project different from so many others was that it required really close attention to what would otherwise be thought of as small details. Whereas transitions and animations are often afterthoughts to the core parts of an app or feature, in this case, they were core to our success. If the transition was jumpy or unnatural, people would find the new experience jarring and painful. If we didn’t nail the experience, people would get frustrated and find other ways to share their photos. Many hours were spent dealing with how the keyboard slid out, how the gallery slid in, how the full-size gallery took over the whole screen, and more. We’ve always had amazing talent at Yammer; in this project, I believe the skill of our designers and developers allowed us to deliver a product of exceedingly high quality in a very short period of time.

Test It

In general, we show an experiment to the fewest number of users possible because this allows us to get statistically significant data from the smallest pool of users. In the event the feature is a bad experience or just doesn’t test well, we will have disrupted fewer people than we otherwise would if we tested everything at 50/50. On mobile, however, because our usage is so much less than web, the smallest group of users is invariably 50%. So we ran this as a standard 50/50 A/B test.

Analyze the Results

Initial results were showing no significant effects. So we decided to give it a few more weeks to see if things would change, but they didn’t. Even after seven weeks, the results were disappointing: this was as flat as flat gets. Our core metrics — those we value most highly — didn’t move at all. These include days engaged, the number of people posting, the number of messages, new user retention, etc. In a lot of cases, the job of a Yammer PM is made more difficult when local metrics (metrics that tell us how people use something) go up, but core metrics are either flat or negative. In this case, our overhaul didn’t have any real impact on local metrics either. And that’s very disconcerting because it’s far easier to move local metrics than global metrics.

  • The number of people posting images didn’t go up.
  • The number of people people posting multiple images didn’t go up.
  • The number of posts with multiple images didn’t move. In short, we did not validate our hypothesis. Often when you read blogs about feature overhauls, they are either massive successes or massive failures. But these kind of results are the hardest when it comes to product tradeoffs, analysis, and vision.

Ship, iterate, or kill it

At the end of the day, we shipped this feature. But it was a difficult and long-debated decision. In the end, it came down to three things:
Without a doubt, we made it easier to attach multiple images.
We believe we created a better experience.
We refactored some very old code. Which actually made it a much easier decision. Even if this were not true, I would imagine that the first two bullet-points would have been compelling enough to base our decision on.

The obvious question is why didn’t this test well? For users, it’s about desire to post a photo rather than ease of posting. I believe our results were flat because people who really wanted or needed to post photos overcame the friction of doing so in the old experience. Making it easier to post photos apparently does not influence someone’s desire to post a photo. For that, we’d have to think about something that is much more top of the funnel. Expecting every feature to address both problems would be unrealistic. Overall, I think this project is a good example of being data-informed and not complete slaves to data.


LAUNCHDARKLY HELPS YOU BUILD BETTER SOFTWARE FASTER WITH FEATURE FLAGS AS A SERVICE. START YOUR FREE TRIAL NOW.