18 Aug 2017

Risk Reduction and Harm Mitigation

Risk Reduction is trying to make sure bad things happen as rarely as possible. It is anti-lock brakes, vaccinations, clothing irons that turn off by themselves, and all sorts of things that we think of as safety modifications in our life. We are trying to build lives where bad things happen less often.

Harm Mitigation is what we do to make sure that when bad things do happen, they are less catastrophic. Fire sprinklers in buildings, seat belts, and needle exchanges are all about making the consequences of something bad less terrible.

What does that mean for a DevOps world where our risks and harms are very different? Charity Majors says we shouldn’t split the world into developers and operations, but into product and infrastructure—and I think that’s a useful way to think about risk too.

Product risk is problems that users experience. We can usually predict and mitigate the danger by testing and being aware of common failings. For example, we can expect and plan for users that may be on a flaky connection, or that they may try to exit a page without saving information. We can work around these problems because we know they’re out there. But our deployments are not something they should see until we’re ready for prime time. For this kind of risk, we use feature flags to control what content is delivered.

Infrastructure risk is more about the inherent fragility of delivering software. CDNs, fiber, servers, switches, towers…the whole system of getting data to people has failure zones. When we are trying to reduce infrastructure risk, we assume that latency is ever-present, networks are intermittent, and we won’t always be able to count on everything working right the first time. We try to build in robust and flexible ways than can route around failures. This is the place we might use feature flags to control failovers or to create circuit breakers to prevent flooding a fragile sector.

Product harm reduction is about making sure that users can have a positive experience, even if something happens that makes it less than ideal. We want to preserve their blog drafts, keep them from committing errors, only show them things they are allowed to change, and above all, avoid giving them a blank page. Something has gone wrong in the trip to the user, but they shouldn’t have to suffer for it.

Infrastructure harm reduction is the ability to pull back breaking fixes, shunt users away from vulnerabilities, and respond near-immediately to things that have gone very wrong. Harm reduction at this level is the kind of action a pager-responder can take to get things back on track before doing more intensive repairs and investigations in the morning.

In my first week, I spent a lot of time thinking about how to summarize our product for a variety of audiences. “Feature flags as a service” is short and pithy, but only works if you can bring your own definition of “feature flag” and a business understanding of why you would use them. What about “Feature flags allow you to make changes in near real-time instead of waiting for a deployment, and LaunchDarkly helps you manage and track flags across an organization”?

Well, that works, but it still doesn’t get to the core of why an organization would want to use feature flags. What I’ve come up with so far is this:

Feature flags segment the risk of creating a product into manageable parts.

Creating and deploying software is risky. We can accidentally build in errors, we can deliver it badly, or to the wrong people, and it can interact in unfortunate ways with existing software or hardware. As organizations, we want to do our best to do no harm and provide benefit. Using feature flags lets us wrap our features in decision points that we can then use to make life easier for our users.

Here are some types of risks that are reduced by using feature flags:

  • Server falls over from too much traffic
  • Canary launch is not well-tracked, problems are missed
  • Old features and workarounds are invisible and get left in place
  • Feature with vulnerable content is deployed
  • API endpoints are exposed to unauthorized users

Managing your feature flags is a post for another day—today I ask that you take a few minutes and think about how you can reduce risk and minimize harm in your organization, your project, or your code. How can you make things robust enough to resist failure, instrumented sufficiently to identify a failure spot, and flexible enough to reduce harmful consequences on the fly?

08 Feb 2017

Toggle Talk with Slack’s Director of Engineering Josh Wills

I sat down with Josh Wills, Director of Engineering at Slack and unabashed feature flag enthusiast, to get his opinions on the practice, where it’s most useful, and how it has played an important part in his career.  I think my favorite take-away from him was,

“Feature flagging is scary. I get why it’s scary. But to me, not launching your product every five minutes is scary… Launching continuously is how you learn fast– It’s not just about deploying fast, it’s about learning fast. That’s the future of viability.”

Here’s what I heard from Josh in the interview.  


How long have you been feature flagging?

I started feature flagging at Google in 2007, and when I joined the team they were in the middle of rewriting the feature flag system. Our feature flag framework was called the Experiments Framework because it was designed for running A/B tests, and it grew out of that into this very powerful and complicated feature flagging framework that we used for literally everything. It started out as a simple “feature on”/“feature off” system, but it evolved to include richer types like strings and floats and even had some simple conditional logic for modifying flags based on attributes of the request. It became the system through which all of Google’s various machine learning models were combined to make decisions for ad ranking, search ranking, etc.

Imagine that you have dozens of machine learning models active on a given request, doing all kinds of different things, and your job is to figure out an algorithm for deciding which ads should go where and how much each advertiser needs to pay. All of those different signals are combined together through a complicated series of equations which have a bunch of thresholds and weights. There’s very rich logic in not only the machine learning models, obviously, but also within the feature flag framework, to control under different contexts what counts the most.

Trying to do data science and machine learning in production without feature flags is nuts.

The feature flag framework here at Slack was developed in-house and was not initially developed with data science in mind, but we eventually created one that was to support our own search ranking, backend performance, and new team onboarding experiments.

But when I got here, I was so happy that they had a feature flag framework at all– it means you deploy code hundreds of times a day, not once a month, or whatever it is other companies do.


What do you prefer to call it and why?

I like to call it the experiment framework, but that’s just Google/Xoogler nomenclature. Facebook’s system is called Gatekeeper and it’s basically the same idea. Most of these systems have converged to have the same set of features, because it just makes sense and you have to have certain things. Eventually you’ll get there anyway, so why not just skip to the end?  


When do you think feature flagging is most useful?

I’m biased, since I’m a data person. Data science, machine learning is what I do and I think it is absolutely critical for machine learning, for bringing any kind of data driven feedback loops and intelligence into your application. That is when it is absolutely most critical.

You should not be doing machine learning without a feature flag framework.

If you are saying, “Oh I want to get into machine learning, and I’m going to do predictions or personalization or recommendations,” which lots of people do, and you are doing it without a feature flag framework, you are insane and should be fired. Not to put too fine a point on it, but…


Are there any cases where feature flagging is not a good idea?

Well, there is always tech debt that goes along with doing this kind of stuff. We deal with this at Slack, and Google deals with it as well. You end up with code that has a lot of “if” statements in it. Engineers are not always the best about deleting their feature flags once they’re no longer necessary.

So we have a whole archival system, so that when you do the code review to create the feature flag, you also have to specify when you plan to delete the flag, and get alerts if the flag is not removed by such-and-such a date. That is the cost of doing business. I would say the benefits massively outweigh the costs.

I think there are certain other situations where feature flags are not a good idea.  They’re relatively rare, but they do happen. Where you’re switching backend systems there can be times where you have to just go for it. You maybe reach a point where you just can’t quite bring yourself to burn the bridge and just go without the old system anymore, even though it’s causing you pain, and you know it needs to go. You just need to bite the bullet and do it, do the cut over and live with the consequences.

You work super hard to not ever find yourself in that situation. No team is going to go out of their way to corner themselves like that, but if you’re cornered, you’ve got to fight your way out.


Best use of a feature flag – a personal story?

It’s been crucial to the way I’ve worked ever since I moved to San Francisco. Coming from the perspective of doing machine learning, data science, etc. at Google, I ran thousands of experiments all in search of new ways of combining different models together, in order to fundamentally make Google more money, make the user experience better, and generate more ROI for advertisers. I got really good at it. You could say that feature flags made my career.

At the same time, there was one Friday afternoon back in 2009 where I thought it would be a good idea to do one last push, and I launched a bad feature flag configuration that broke the ads systems and cost Google a lot of money. I remember my boss saying at the time, “So Josh, what did you learn from the X million dollars we just spent educating you?”

It’s the kind of thing where you learn that canaries are good, and that 5 pm pushes on a Friday are pretty much always a bad idea, no matter how good your feature flagging framework is.


What do you think is the number one mistake that’s made around feature flagging?

I think my biggest thing is if you are really thinking of feature flags as just “on” “off” switches, you are missing the real power of what they can do.

To me the feature flag space is the parameter space that I get to explore to optimize whatever metrics that I want to optimize. If you are constraining yourself to this very limited boolean on/off space without strings, floats, etc. you’re putting artificial limits on how fast you can explore the space and about how all of the knobs at your disposal work.

This will be the early mistake that I think a lot of people make– they won’t feature flag often enough.


Can you share any tips for better flagging?

I think what was most compelling for me at Google was the configuration language for feature flags was very rich, but not Turing complete. I don’t think that configuration-as-code is a good idea for feature flags, because it becomes harder to test/validate, which slows down how quickly you can roll things out. However, the configuration language was like programming in the sense that you could define what we called “condition functions” that could be evaluated in the context of a request and used to adjust the values of the flags. So the logic was like a long series of “if” statements, where you could modify the resulting value either by overriding it or by addition, multiplication, or other custom operators.

The way it worked when I was at Google and working on the ad system was that the server binary was pushed weekly, so the only changes that you could make during the week were via experiments. Having that sort of richness of programming via feature flags allowed a lot more freedom and a lot rapid safer iteration between those weekly pushes. The same logic applies today for mobile apps, where you can only do releases via an app store, but you want to be learning what works much faster than that.


How do you think feature flags play into the DevOps movement? Continuous Delivery?

How would you do anything without them?

What does it mean to do DevOps without feature flags? It’s one of those things that doesn’t make sense to me. How would you make mayonnaise without eggs? It’s like, not really mayonnaise then. Which is good, because mayonnaise is gross.

I would be fascinated and somewhat horrified to have someone try to explain a feature flag-less DevOps set up. I don’t know what would that look like.


Are you seeing feature flagging evolving? If so how? And how do you expect it to change in the future?

Not as much as I would like, broadly speaking. I’m not at Google anymore, so I don’t know what the current state was when I left. The stuff they had is much richer than anything I’ve seen anywhere else, but to be fair, they were doing a lot of machine learning much earlier than anyone else, too.

I think feature flags need to find a way to strike the balance between configuration as on/off switches and configuration as Turing-complete programming language. That was the thing I felt was most compelling and powerful about the way Google did it that I’ve not seen anywhere else.


26 Jan 2017

Launched: Flag Tagging Management

LaunchDarkly Feature Flag / Feature Toggle Targeting and Management

For better feature flag management, LaunchDarkly allows you to create tags for organizing and grouping your feature flags.  Adding tags (like “Front-End”, “Ops”, “Marketing”, “Restricted”) helps you categorize flags and manage custom permissions.

Here, we have added the tags “mobile”, “marketing”, and “unrestricted” on the Settings tab of our feature flag:

After adding tags, you can filter by tag on the dashboard and link to filters for better feature flag management.

Here, we have clicked on the “marketing” tag, which has created a filter that shows all feature flags tagged “marketing.”

LaunchDarkly Feature Flag Tagging and Management Dashboard

Creating a filter also generates a URL that you can bookmark and share with your teammates.  For example, this URL will show all feature flags tagged “marketing”  https://app.launchdarkly.com/default/production/features?tag=marketing .

In the future, we will add more advanced filtering and sorting for even better flag management.  If you have any suggestions or questions, please feel free to contact us at support@launchdarkly.com

23 Jan 2017

Microsoft VSTS and LaunchDarkly Team Up to Control Releases

LaunchDarkly and Microsoft controlled rollouts and use targeting feature toggles and feature flags

It’s every project manager’s dream to get complete control over their software releases and have a way to continuously integrate and deliver new features, while also controlling the visibility of those features once they’re live.

Microsoft Visual Studio Team Services

Microsoft’s Visual Studio Team Services (VSTS) enables development teams to build, integrate, test, and release new software in a single platform. It has a centralized version control system that allows for continuous integration, package management, and release management. Together, these components enable agile teams to move faster in a more centralized manner while also minimizing the number of third party dependencies needed for deployment.

Traditionally, when you release new features into your Production environment, they are live for all of your users. You could provide code-level access control via a config file or modify database values to superficially control the visibility of your new features. However, these release controls are engineer-dependent, meaning they require a developer to make changes to who gets to see a feature and when. It makes it very difficult to target granular segments of users or perform incremental percentage rollouts to test performance and visibility.

LaunchDarkly VSTS Extension

With LaunchDarkly, you can use feature flags to control the full release lifecycles of your features, perform percentage rollouts, and granularly target user segments. These controls are surfaced via a UI that can be used by non-developers, including designers, project managers, and the business team.

LaunchDarkly and Microsoft Visual Studio Team Services (VSTS) Feature Flags / Toggles for release management

To tie this feature flag UI into your VSTS workflow, the LaunchDarkly Extension  allows you to associate feature flag rollouts with VSTS work items to get complete control over who sees what, and when. This allows teams to:

  • Centralize feature and release lifecycle management
  • Release confidently with less risk
  • Incorporate feature flags into the release process
  • Team support for superior project management
  • Gain better visibility into features and outcomes
  • Achieve next-level continuous delivery
  • Empower your team to do better DevOps

Here’s some of the core functionality when you integrate VSTS and LaunchDarkly:

Associate feature flags with work items

Take full control of the release lifecycle of your work items and manage feature rollout

LaunchDarkly and Microsoft Visual Studio Team Services (VSTS) Feature Flags / Toggles for release management work items

Comprehensive Release Management

Manage percentage rollouts and turn the feature flag on or off

LaunchDarkly and Microsoft Visual Studio Team Services (VSTS) Feature Flags / Toggles for release management controlled percentage rollouts


Incorporate feature flags into your release definitions

Tie feature flags to your Visual Studio Team Services release definitions and perform percentage rollouts:

LaunchDarkly and Microsoft Visual Studio Team Services (VSTS) Feature Flags / Toggles for release management work items and rollouts

Getting Started

If you already use VSTS and have a LaunchDarkly account, then all you have to do is connect the LaunchDarkly extension and you’re ready to feature flag and take total control over your releases.

You can also check out the documentation to learn more about setup and integration.

05 Dec 2016

Cultural Changes of Feature Flagging vs Branching: Defrag X

I was honored to give a talk at Defrag X on “The cultural changes of feature flags vs feature branching”. Feature flags initially enabled developers to separate deployment from release. Now an entire organization (product, marketing, sales, customer success) can shift from a waterfall way of thinking to a more iterative, customer- centric release. Afterwards, I was thrilled by how many people came up and said they appreciated my talk. 

When I first stDenverDefragarted LaunchDarkly, I met with an Engineering Director, who was excited about feature flags but was still on a six month release cycle. That same Engineering Director was at Defrag. He’s now at a different company with a quicker release cycle and is ready to move forward with feature flags. I’m happy that as release cycles get quicker and quicker, so to does the need for feature flag management.

Thanks to Defrag for ten years of great conferences, with even a bonus snow appearance.

29 Nov 2016

Toggle Talk with Damian Brady

I sat down with Damian Brady, Solution Architect at Octopus Deploy for a conversation about his experience with feature toggles.  He shared with me his tips for best practices, philosophies on when to flag and what he thinks the future of feature flagging will look like. 

  • How long have you been feature flagging?

I had to think about this one a bit – about 8 years ago but I probably didn’t know what it was called at the time.

“It’s definitely the case that people are doing this without knowing the name “feature flag” or even giving it a name. They’re just saying it’s a configuration switch or a toggle and but not giving it a more proper name, they’re not identifying it as a first-class citizen really.”   

  • What do you prefer to call it and why?

Now I call it feature flagging or occasionally feature toggles. I think toggles makes a bit more sense as analogy for non-technical people.  

  • When do you think feature flagging is most useful?

There’s a couple – but the one I think it’s most useful for is to use a feature flag when you have a feature that is nearly complete or complete from your point of view. Either way, you are ready to get verification from someone with real data.

“You can test as much as you want with your pretend fake data, or even a dump from production which is being obfuscated, but until it gets used in the wild you’re never really sure that the feature is doing exactly what it needs to do.”  

So hiding that behind a feature flag, and then clicking it on for somebody who is using the product for real in any way gives you that last little test that is ultimately the most useful.  At that point you still have the opportunity to back out. If something was corrupt or your expectations were wrong, it’s really useful for that last-minute check.  

At Octopus, we’ve started using feature flags for big features that a lot of people don’t want to see. So a while ago we introduced the idea of a multi-tenant deployment. And probably most of our users don’t need that feature because it adds a lot of complexity to the UI.  We have a configuration section where you can toggle an “on” and “off” switch, so if you don’t need that feature you can just leave it off.     

Are there any cases where feature flagging is not a good idea?

I think there are two extremes where feature flagging is not a good idea. On one hand, flagging really small changes can be more trouble than it’s worth. It’s introducing an extra level of complexity that maybe for a small change is not critical.  

On the other side, using feature toggles around the architectural changes in the core of your application – that’s kind of hard to test. Do you have a feature flag that when you turn it “on” it completely redirects the way the entire application will run? In that case you bite the bullet and decide that this is a big change and you’re just going to have to test it very thoroughly and not give yourself a way out.

That being said, there are some cases where you still need to give yourself a way out by using a flag. For example, you might deploy some new feature thinking it’s correct, but subsequently learn from a customer or user that it doesn’t really meet their needs. Rather than the user living with a bad feature, you might want to turn the flag off and go back to the drawing board.

If it’s an architectural change, you may only find out that there’s a bug when you use it in production. Test data may not surface the issue properly.

Ultimately, doing core architecture changes in a way where you can back out later can be an extra huge amount of work. It’s probably at that point you know you aren’t going to do it (revert back) anyway.   

  • Best use of a feature flag – a personal story?

When I first started using feature flags, around the 8 years ago timeframe, I was working on a web application that was internal and a big line of business.  And we had just added a new third party provider for providing SMS.  And with this new provider, it meant we had to write a lot of new code.  It was internet banking software so it was a one-time password we were sending out – and it was really really important that it work.

We tested everything rigorously but wanted more insurance.  So we put the new service behind a feature flag. We had a bunch of agents that ran this type of SMS. We enabled a flag for one of the agents and monitored it to make sure it was actually doing the right thing and not failing. And then we started trying other ones. It failed a couple of times because of differences in the sandbox environment between the third-party provider and the real one.

“We thought everything was okay, but when we put it live we turned it on slowly, and it didn’t do what we expected.”

So when that happened we turned it back off again…and went back to the drawing board.

So without the feature flag, we would have dropped every person using the service at that critical point. That client would have not been able to receive SMS’s until we were able to rollback.

  • What do you think is the number one mistake that’s made around feature flagging?

There is one that I keep seeing – when you wrap a new feature you believe to be finished in a flag, the biggest mistake with this is not testing that change with the flag “on” and then “off”. For instance, when you turn it “on” it snaps into new database tables or starts changing the way data is saved. But when you turn it “off” again, you’ve lost that data or data is corrupt. For this you need to test it “on” and “off”.  

“If you have more than one feature flag running at the same time, test the combinations of them being both ‘on’ and ‘off’.

If they’re likely to interact with each other you need to test “one on, one off,” “both on,” “both off” and all possible combinations like this.  

  • How do you think feature flags play into the DevOps movement? What about Continuous Delivery?

I think feature flags play in both continuous delivery and continuous deployment. I think they’re most useful to continuous deployment. You have all of your features pushed out to production as soon as they compile essentially – but they are behind a feature flag so you don’t break anything. That’s the way Facebook does it. They know that any new code they write might end up in production so it’s going to be safe behind a feature flag.  

“The design of the DevOps movement, the aim of it really, is to get real features and real value in the hands of users as quickly as possible.”

So if you have to wait until this “half done piece of work” is actually safe to deploy then that slows you down. So having it behind a feature flag so that it doesn’t get touched until you are ready to test it can be really powerful for increasing velocity and getting things out to production much faster.     

So even for marketing teams, it means they don’t have to tell the developers “hey we worked out the result of this a/b test and we want option b.” If the marketing department can just just flip that switch and say “no, option b is working better so just leave it there” without a new deployment or contacting the developers to remove the old stuff and redirect to the new stuff,  that increases that team effort of getting value to customers which is the whole purpose of DevOps.

  • Can you share any tips for better flagging?

If you’re feature flagging a big change, pair the feature flag with a branch by abstraction pattern. See the clip from my talk from NDC Sydney for more details.

There’s also the concept of transitional deployments – again refer to my video clip here for more. It’s useful for things like database schema changes where you have a midpoint for both the new and old applications that will work with the schema that’s currently there. So you can turn that feature off if you need to.

  • Are you seeing feature flagging evolving? If so how?  And how do you expect it to change in the future?

It’s been around for a long time…but I think it’s becoming much more visible – and partly it’s LaunchDarkly helping with that. I think more people will start using feature flags in their continuous delivery pipeline. And the more continuous delivery becomes mainstream, the more mainstream developers will need feature flags.  

“I think feature flagging is starting to be something that you have to add your deployment cycle because you know it needs to be fast and you know you feature needs to get to production as quickly as possible – and feature flags are the way to do that.”  

So as it becomes more mainstream I think there will be more tools, more frameworks, more awareness of it (feature flags) as it hits more and more companies. I think there will be things coming out like feature flag-aware testing tools – so testing tools that know that they need to test with this flag on and off.  

The summary – more tools around best practices around this thing which is becoming more mainstream.  With DevOps becoming more popular, more people are thinking “yes we need to get to production quicker, we need that cycle time to reduce” so it’s a natural extension I think to start solving some of those problems with feature flags.  

“I think it’s just starting to become more mainstream frankly because it’s a solution to a problem that is starting to become more mainstream.”