18 Aug 2017

Risk Reduction and Harm Mitigation

Risk Reduction is trying to make sure bad things happen as rarely as possible. It is anti-lock brakes, vaccinations, clothing irons that turn off by themselves, and all sorts of things that we think of as safety modifications in our life. We are trying to build lives where bad things happen less often.

Harm Mitigation is what we do to make sure that when bad things do happen, they are less catastrophic. Fire sprinklers in buildings, seat belts, and needle exchanges are all about making the consequences of something bad less terrible.

What does that mean for a DevOps world where our risks and harms are very different? Charity Majors says we shouldn’t split the world into developers and operations, but into product and infrastructure—and I think that’s a useful way to think about risk too.

Product risk is problems that users experience. We can usually predict and mitigate the danger by testing and being aware of common failings. For example, we can expect and plan for users that may be on a flaky connection, or that they may try to exit a page without saving information. We can work around these problems because we know they’re out there. But our deployments are not something they should see until we’re ready for prime time. For this kind of risk, we use feature flags to control what content is delivered.

Infrastructure risk is more about the inherent fragility of delivering software. CDNs, fiber, servers, switches, towers…the whole system of getting data to people has failure zones. When we are trying to reduce infrastructure risk, we assume that latency is ever-present, networks are intermittent, and we won’t always be able to count on everything working right the first time. We try to build in robust and flexible ways than can route around failures. This is the place we might use feature flags to control failovers or to create circuit breakers to prevent flooding a fragile sector.

Product harm reduction is about making sure that users can have a positive experience, even if something happens that makes it less than ideal. We want to preserve their blog drafts, keep them from committing errors, only show them things they are allowed to change, and above all, avoid giving them a blank page. Something has gone wrong in the trip to the user, but they shouldn’t have to suffer for it.

Infrastructure harm reduction is the ability to pull back breaking fixes, shunt users away from vulnerabilities, and respond near-immediately to things that have gone very wrong. Harm reduction at this level is the kind of action a pager-responder can take to get things back on track before doing more intensive repairs and investigations in the morning.

In my first week, I spent a lot of time thinking about how to summarize our product for a variety of audiences. “Feature flags as a service” is short and pithy, but only works if you can bring your own definition of “feature flag” and a business understanding of why you would use them. What about “Feature flags allow you to make changes in near real-time instead of waiting for a deployment, and LaunchDarkly helps you manage and track flags across an organization”?

Well, that works, but it still doesn’t get to the core of why an organization would want to use feature flags. What I’ve come up with so far is this:

Feature flags segment the risk of creating a product into manageable parts.

Creating and deploying software is risky. We can accidentally build in errors, we can deliver it badly, or to the wrong people, and it can interact in unfortunate ways with existing software or hardware. As organizations, we want to do our best to do no harm and provide benefit. Using feature flags lets us wrap our features in decision points that we can then use to make life easier for our users.

Here are some types of risks that are reduced by using feature flags:

  • Server falls over from too much traffic
  • Canary launch is not well-tracked, problems are missed
  • Old features and workarounds are invisible and get left in place
  • Feature with vulnerable content is deployed
  • API endpoints are exposed to unauthorized users

Managing your feature flags is a post for another day—today I ask that you take a few minutes and think about how you can reduce risk and minimize harm in your organization, your project, or your code. How can you make things robust enough to resist failure, instrumented sufficiently to identify a failure spot, and flexible enough to reduce harmful consequences on the fly?

16 Aug 2017

Week 1: How to Put Your SOC On

Enter Your Password

What does a new engineer do during their first week at a SOC 2 Compliant startup? Write code? Maybe. Deploy code? Hopefully. Create accounts? Certainly.  Generate passwords? Ad nauseam.

After creating my task tracking and document sharing accounts, half the items I saw on my TODO lists were about creating accounts on more services. Also on my calendar was to attend training for one of LaunchDarkly’s newest initiatives: SOC 2 Compliance.

At LaunchDarkly, we maintain mission critical services for our customers (feature flags!). And for those who opt for premium services, we also store sensitive data about their clients as part of our analytics features. It is essential to our business that we protect not only access to control over customer application behavior, but to all client data we store on behalf of our customers.

After our security training, each member of my incoming class made a commitment to:

  • Create a unique password for every service. Use a password generator and a password manager!
  • Enable 2-factor authentication for every service that offers it.
  • Avoid sharing passwords and accounts with team members to keep a precise audit trail.
  • Restrict browser plugins to the minimum necessary to do your job. Those plugins can read your data.
  • Secure your laptop with FileVault and lock screens.
  • Limit connected applications with access to Gmail, GitHub and other accounts.
  • Secure customer data. (Obfuscated links don’t cut it!)

These are all great practices even if your business doesn’t need SOC 2 certification. Now to deploy some code (if I can just remember where I’ve written down my SSH key…).

08 Aug 2017

Flexible Infrastructure with Continuous Integration and Feature Flagging

flexible infrastructure

I’m incredibly excited to be LaunchDarkly’s first solutions engineer. During my first week I got to learn about some of the clever ways we do feature management. Not only do we use feature flags to control the release of fixes and new features, but we also use them to manage the health of our infrastructure in production. I’ve been a part of a number of teams, and I’ve never seen a more advanced development pipeline.

Normally, dealing with issues in production can be a frightening and time-consuming experience, but adopting a mature continuous delivery pipeline can allow you to react faster and be proactive. Continuously integrating and deploying makes getting fixes into your production code a trivial task, but using feature flagging takes it to the next level and lets you put fixes in place for potential future pain points that you can easily enable without having to do another deploy.

One common problem is handling extreme server load. This is managed easily with LaunchDarkly. Imagine you have a server that is pulling time-sensitive jobs from a queue, but the queue fills faster than the server can handle it, and causes all jobs to fail. In situations like these it would be better to at least get some of the jobs done, instead of none of them. This is concept is known as “bend-don’t-break”.

I built a proof of concept using Python and rabbitMQ which demonstrates how you could use LaunchDarkly’s dashboard to control what percentage of jobs get done, and the rest get thrown away. If the worker takes too long to get to the job, the job fails. As you see the queue grow you can manage it easily with feature flags.

It consists of two scripts, taskQueuer and taskWorker. The taskQueuer adds imaginary time-sensitive jobs to the queue; the rate is configurable using feature flags.

The taskWorker removes one job from the queue and processes it. One job takes one second. If tasks are queued faster than the worker can process it, the queue fills up and the worker will begin failing. To protect against this, you can use the “skip rate” feature flag to allow the worker to drop a certain percentage of jobs on the floor.

The concept of using LaunchDarkly as a control panel to manage the operation of your app is really cool and opens up a world of possibilities beyond simply percentage rollouts and canary releases. If you have an interesting implementation, get in touch with us at hello@launchdarkly.com and maybe we’ll feature it!

More about tough devops: https://insights.sei.cmu.edu/devops/2015/04/build-devops-tough.html
More about using Python with rabbitMQ: https://www.rabbitmq.com/tutorials/tutorial-one-python.html
More about LaunchDarkly’s stack: https://stackshare.io/launchdarkly/how-launchdarkly-serves-over-4-billion-feature-flags-daily

07 Aug 2017

Growing at LaunchDarkly: An Intern’s Perspective

When I first visited LaunchDarkly back in March, I knew there was no place I’d rather be. Coming straight out of college, moving fast (and breaking some things) had become my mantra. What I soon realized was that a high-velocity release cycle doesn’t have to mean that code quality should be sacrificed. Who knew that gateing your code behind feature toggles would basically annihilate the risk of releasing faulty features to your users? I certainly did not.

Photo via Tumbler

Working to refine the Customer Success process at LaunchDarkly has taught me many things. Ranging from working with the development team to fight fires, to assisting in product design to meet customer needs, being a Dark Launcher has created a great interdisciplinary experience. Customer Success has allowed me to build a web of knowledge on how a SaaS startup works and which requirements need to be satisfied for success on all fronts.

If I had to choose the best part of being at LaunchDarkly, it would be the people here and the atmosphere they create. This summer has been a really exciting time to be a part of this team. I’ve had the chance to watch the team almost double in size. This has meant that we have had may conversations about culture and the type of company that we each want to be a part of. One outcome of these conversations was the addition of  a “Buddy Lunch” program, an initiative which aims to introduce peers to each other on a more personal level and enjoy some awesome food!

I’m happy to announce that I’m transitioning to a full-time support role here at LaunchDarkly! It feels like I’m the outcome of a feature experiment that went really well, and now I’m excited to go into production full-time.

27 Jul 2017

To Be Continuous: Celebrating Failure, Founder Guilt, Serial Entrepreneurs, Startup Myths, The Everything Else Person

In the latest episode of To Be Continuous, Edith and Paul discuss a medley of start-up pertinent topics. Paul wonders whether the pendulum has swung too far the other way, with too many startups now glorifying failure.

They also examine ways of coping with founder guilt and dispel some common myths about life in a startup. Finally, they examine the invaluable role of the “Everything Else” person in start-ups. This is episode #35 in the To Be Continuous podcast series all about continuous delivery and software development.

Continue reading “To Be Continuous: Celebrating Failure, Founder Guilt, Serial Entrepreneurs, Startup Myths, The Everything Else Person” »

25 Jul 2017

Launched: LaunchDarkly SOC 2 Certification

Providing an always-on, highly secure feature management service is core to the LaunchDarkly platform. From the beginning we have designed and built our infrastructure and practices with security and availability as a priority.

Today, we are announcing the next level of this commitment to Enterprise readiness and stability and are pleased to have achieved SOC 2 Type 1 certification.

Here are a few examples of what you can read about in the report:

  • LaunchDarkly security policies
  • LaunchDarkly logical and physical access controls
  • LaunchDarkly change management process
  • LaunchDarkly data backup and disaster recovery strategies
  • LaunchDarkly system monitoring, alerts and alarms

Protecting the data and privacy of our customers is a non negotiable aspect of what we do. Our SOC 2 certification provides you with an additional assurance that we have all the right controls in place to protect your data and ensure the availability of our service and your features.

To request a copy of LaunchDarkly SOC 2 report, please email trust@launchdarkly.com.