02 Dec 2015

In 2016, can DevOps keep pace with consumer expectations?

Consumer Expectations

Every morning, I roll out of bed at 8:01am and lazily reach for my phone to check my Apple News feed.   Sometimes, the app has the audacity to make me wait 2 full seconds before it refreshes my feed with 100 articles from dozens of sources around the web.

Unacceptable!  I want my news feed instantly.  I want it now.  I can feel myself get frustrated and I can feel my patience evaporate in a flash.  Then I take a step back: since when do I take personal offense to an application that doesn’t load instantly?  Why do I get flustered as if I was rear-ended by a car?

This is just a microcosm of current consumer expectations.   An iOS bug becomes a TechCrunch headline, a bad Facebook feature makes CNN’s front page, and an app that drains 0.1% more battery life becomes a human rights violation.

As a developer, imagine trying to adapt to these expectations.  Consumers want things to work instantly, smoothly, and intelligently.  They want everything to be perfect and work seamlessly or else they get frustrated, offended, and vocally upset.  “Ugh, why do I have to click TWO buttons to see this?”


Moving into 2016, what can developers do to meet or even exceed these expectations? Simply releasing software faster will not necessarily mitigate risk, nor will it necessarily lead to a better product.  The key question to address is “how can I adapt my development process to exceed consumer expectations?”

It doesn’t matter how wonderful your new feature is if it degrades your application’s performance.  Product-market fit no longer means that your product merely serves a market, but that it must meet that market’s expectations for performance.

How you release a feature becomes as important as the feature itself.   DevOps in 2016 should be the year of the incremental rollout, whereby assessing your application’s response to a new feature becomes a prerequisite for a launch.  I am not strictly referring to local or staged testing, but actual testing in production.

I recently published a piece on feature flag driven development and I feel that this practice of compartmentalized release is essential for managing risk and meeting consumer expectations.  By flagging a feature (i.e wrapping it in a condition), you can deploy it off and then incrementally turn it on for particular users, assessing performance feedback along the way.

LaunchDarkly Feature Flag Rollouts

Managing Scalability with Rollouts

Let’s look at an example.  You are a developer launching a new feature that will require you to process hundreds of additional requests per second.  A few hundred more?  That’s no problem – you’ve built the infrastructure to scale and handle that load.  But, what if all your users fall in love with the new feature?  Bombarding you with thousands of requests.  How do you manage this?

Imagine that you were able to roll out your feature live to 10% of your users and then 20%… 30%….  Each step becomes a testing benchmark where you assess performance feedback and can scale accordingly.

More importantly, you mitigate unanticipated performance degradation and meet the consumer expectation of seamless application performance.

implementing rollouts

Of course, feature flagged rollouts are not the saving grace for every feature launch or app update, but they’re fast becoming essential for DevOps as consumer expectations continue to raise the bar for performance.

Managing risk does not need to be a transformative operational process.  It can be easily achieved by flagging your features and gradually releasing them to your users.  After all, there’s no better way to get genuine feedback than testing in a real environment.

11 Aug 2015

Secret to Facebook’s Hacker Engineering Culture

Facebook’s engineering is legendary for its speed and execution. You too can be as quick and smart as Facebook, if you know their hacker engineering secret. Originally they lived by “Move Fast and Break Things”, which has now evolved with wisdom to “Move Fast With Stable Infra.” Speed is important, as is stability and providing a good experience to users.Facebook’s engineering Kent Beck wrote a great Facebook Note on how Facebook embraces reversibility to scale up. I highly recommend you read his entire post.

Facebook has a secret sauce: an in-house system called Gatekeeper that allows them to get quick feature feedback and quickly iterate based on feedback. Engineering changes are wrapped with a feature flag and pushed live to production. However, the features are live but off, then turned on via Gatekeeper to different users . Facebook’s seemingly simple system of separating deployment from rollout unlocks many powerful ways to move faster with more stability. All items in italics below are quotes from Kent Beck, followed by my analysis of how Facebook uses Gatekeeper.

Internal usage. Engineers can make a change, get feedback from thousands of employees using the change, and roll it back in an hour.

Initially, the engineer uses Gatekeeper to turn the feature on to internal users (only) . Interestingly, I’ve heard that Facebook is too large for changes to be effectively communicated EXCEPT by actually making the change. Instead of flurries of emails or blasts in chat rooms notifying other groups, Facebook engineers makes the code change and waits for impacted parties to notify them that something is broken, or fix their own dependencies. Separating changes from bigger releases with feature flags mean that any change can be rolled back at any time.

Staged rollout. We can begin deploying a change to a billion people and, if the metrics tank, take it back before problems affect most people using Facebook.

Staged rollout depends on feature flags to encapsulate a change and a feature flagging system (like Gatekeeper) to take it back.

Dynamic configuration. If an engineer has planned for it in the code, we can turn off an offending feature in production in seconds. Alternatively, we can dial features up and down in tiny increments (i.e. only 0.1% of people see the feature) to discover and avoid non-linear effects.

The key to turning features off in seconds (rather than hours or in best case, minutes) is “if the engineer has planned for it in the code”. By using feature flags to separate code deployment from functionality, Facebook can quickly kill malignant features. Without feature flags and Gatekeeper, Facebook would have to do a full redeployment.

Right hand side units. We can add a little bit of functionality to the website and turn it on and off in seconds, all without interfering with people’s primary interaction with NewsFeed.

Facebook smartly uses micro services and avoids monolithic code. Small changes in functionality, wrapped in feature flags, can quickly be toggled on and off using Gatekeeper.

Shadow production. We can experiment with new services under real load, from a tiny trickle to the whole flood, without affecting production.

Facebook pioneered dark launches, the ability to expose features to load without exposing them to users. I’ve heard that it’s impossible to simulate Facebook’s production load as it’s so large. Gatekeeper allows Facebook to control via feature flags load testing from user visibility.

Data-informed decisions. Data-informed decisions are inherently reversible. “We expect this feature to affect this metric. If it doesn’t, it’s gone.”

By wrapping a feature with a flag, it’s possible to isolate its effect on the system. Data-informed decision , tying an individual feature to metrics, is made possible by Gatekeeper and feature flags. Without feature flags, it’s impossible to see the impact of a change – if you release five features and twenty bug fixes at once, and engagement drops by 5%, what feature is to blame? Could one of the bug fixes actually have caused a 10% drop and one of the features a 15% gain? Only by separating out each change can true causation (not just correlation) be seen. Yammer also follows data-informed decision in its product development. Again, it’s necessary to have encapsulation of the feature to both have measurement as well as enable the rollback.

Advance countries. We can roll a feature out to a whole country, generate accurate feedback, and roll it back without affecting most of the people using Facebook.

Gatekeeper and feature flags, are enabling canary launches – using an entire country as “canary in a coal mine” to see if there are issues with a release. Rather than having a world-wide failure, Facebook can iterate quickly and rollback.

Soft launches. When we roll out a feature or application with a minimum of fanfare it can be pulled back with a minimum of public attention.

Facebook, after many misfires like Facebook Beacon, now follows Eric Ries (Don’t launch – separate out a marketing launch from a product launch). With feature flags, Facebook can get feedback from their own users, and control the story. Facebook has avoided the flameouts of Google, which has had epic failures with Google Wave, Google Buzz, and most recently Google Plus – all expensively launched, then expensively decommissioned. With feature flags and Gatekeeper, Facebook is always in control of who sees what when.

Want to be as smart as Facebook for developing software? Want to integrate reversibility, dark launches, data-informed decisions into your own development cycle? The smartest companies like Facebook, Medium, DropBox, and LinkedIn have in-house feature-flagging systems custom built for them. You can build your own system, or simply use LaunchDarkly, “Gatekeeper for everyone else”.



15 Jul 2015

Feature flags, dark launches, and canary releases for all: LaunchDarkly first year in review

It’s been a year since we officially started full time on LaunchDarkly. Leading up to our official first day on July 14, 2014, John and I’d had been sharing ideas for years on how continuous delivery, agile and lean startups had changed the game for effective software development. Back when it was state of the art to release more than once a year, I remember having release parties. Now SaaS rules. Packaged, installed software is dying (or on it’s last gasps). The smart companies like Facebook, LinkedIn, Etsy and Netflix are releasing multiple times per day, and even hour, directly to their users. By iterating quicker and listening to their customers, these companies were delighting their users with more features. As well, developers were happier – it’s painful to build features for years on end, only to find out you’ve missed the mark.

Flickr first started talking about the feature flag/feature flipper pattern in 2009 as their key to engineering success. Facebook kicked off popularizing dark launches in 2009. Etsy, in 2011, noted how feature flags were helping them scale. After Instagram was acquired by Facebook, they adopted Facebook’s practice of canary releases.

Large companies like Facebook and DropBox could afford to build and maintain a dedicated framework for their feature flags, dark launch software and canary releases. Facebook calls their feature flag and experimentation framework: Gatekeeper; DropBox: Gandalf (“none shall pass”). But everyone else who wanted the powerful ability to deploy multiple times per day, control who saw what features, and move fast and ship things had three choices – to build their own expensive infrastructure in house, to ship and hope for the best (the ostrich deploy) or to sit out the continuous delivery movement. John and I saw an opportunity to be “Gandalf for everyone”- dark launch software as a service. LaunchDarkly would let everyone feature flag, dark launch, canary release, and use the continuous deployment tools of a big player, at a fraction of the cost.

So on July 8th, 2014, John checked in his first code on what would become LaunchDarkly. Our official first day of work was July 15th, when we went to work together for the first time. The year has gone by so quickly – we have our first customers, we’ve been joined by our engineers, Alexis Georges and Patrick Kaeding, and we even had our first Dark Launch meetup. What’s next? Continuing to iterate on our features, listening to our customers – continuing to Launch Dark!


08 Jul 2015

Dark Launching Meetup: Lessons Learned

We hosted the first Dark Launching meetup in May with a surprisingly large turnout. We’d originally planned the meetup to be a user group of our current LaunchDarkly users sharing how they were using LaunchDarkly for dark launches. We were very pleasantly surprised by how many people joined our meetup as they wanted to learn more about dark launching itself! Dark launching is a best practice used by Facebook to launch new features “dark”(off), then slowly light up (turn on) features for different users. A key component of dark launches is the ability to easily turn a feature off again if issues are found.

Why dark launch? Shouldn’t all releases always be throughly tested and production ready? No matter how good or thorough your QA and performance testing is, you will never find all issues before production. Dark launches are a recognition that the real world exists. Rather than exposing your release into the full light of day to get blasted, shouldn’t you control access yourself?

I started the Dark Launch meetup by asking for a show of hands from everyone who’d had a bad release. Every hand went up, way up. I then asked if anyone wanted to share their story of a bad release. Everyone put their hands back down. I think there’s too much shame attached to bad releases, when we should 1) admit that bad releases happen to good teams 2) learn from bad releases 3) put process like dark launches into place to help mitigate bad releases.

So I’ll share a tale of two releases – one where I used dark launching, and one where I should have used dark launching. At TripIt, our users emailed us their email travel confirmations. We’d parse the emails, extract the useful information like flight and hotel, and make them a beautiful trip itinerary. Our users loved us, we were the number #1 Travel app in the App Store. I was the product manager on a daring new feature – users would connect their gmail accounts directly to TripIt, skipping the step of manually forwarding itineraries. It was considered extremely risky to have people authorize us to scan their email inbox. So I found a group of frequent travelers who were willing to give us feedback. We pushed the feature “live” to production, but only granted the opted-in users access, as a dark launch.

Internally, we carefully monitored our speed scanning real world inboxes. I also followed up on all users on whether we were importing the right (or wrong) email items. Some early mistakes were being too aggressive with the word “ticket” and importing a Tiffany order, or a Turkish Airlines promotion. When we’d gotten enough real world feedback, we then truly launched, with a TechCrunch article. I was pretty excited both that we’d launched something to help our users as well as my picture making it into TechCrunch!

We didn’t dark launch another email feature at Tripit, with very bad results. A persistent user complaint to TripIt is we would import ALL travel confirmations, whether or not you were the actual traveler. We called this “Other People’s Travel” (OPT). For example, if my sister Margaret Harbaugh emailed me her flight information – bam! her trip would appear in my TripIt account. Our system recognized that there was a travel plan, but there was no way for our system to tell the difference between Margaret and Edith. Or was there? Our engineers implemented some logic to compare names on the account with names on the itinerary and if there wasn’t a close enough match, to not import the trip. Sounds easy – “Margaret” is clearly different than “Edith”, so let’s ship it, right? However, we quickly had a flood of complaints. We had over a million users using auto-import (the largest gmail authorized company worldwide) . It turns out that email itineraries often had very odd concatenations like MsEdith or EdithMs, etc, which TripIt was rejecting as not the same as “Edith”. We were skipping too many emails and our users were very unhappy. They’d relied on us to “automagically” work, and they had tolerated some false positive imports. Now we were ignoring may emails they expected us to import. We had to do an emergency patch and quickly revert the change. If we’d done a dark launch, we could have tested the change with a smaller batch of users and monitor their reaction. It was a lesson learned to me that dark launching is a powerful tool to ensure user satisfaction.

We’re looking forward to our next dark launching meetup to share more stories and lessons learned. You can join the Dark Launch meetup group here, and we hope to see you soon!