12 Apr

How Spinnaker and Feature Flags Together Power DevOps

It’s very common for customers to be excited about both Spinnaker (continuous delivery platform) as well as feature flags. But wait? Aren’t they both continuous delivery platforms? Yes, they are both trying to solve the same pain points – the ability to quickly get code in a repeatable, non-breaking fashion, from the hands of the developers into the arms of hopefully excited end users, with a minimal amount of pain and heartache for everyone along the toolchain. But they solve different pain points:

  • Spinnaker helps you deploy functionality to clusters of machines.
  • Feature Flags help you connect those functionality to clusters of USERS.  

Spinnaker helps with “cluster management and deployment management”. With Spinnaker, it is possible to push out code changes rapidly, sometimes hundreds (if not thousands) of times a day. As Keanu Reeves would say “Whoa.” That’s great! All code is live in production! Spinnaker even has handy tools to run black/red deployments where traffic can be shunted from cluster to cluster based on benchmarks. Dude! For those who remember the “Release to Manufacturing” days where binaries had to be put on an FTP server (and hope that someone would install and download in the next quarter or so), code being live within a few minutes of being written is amazing. For those who remember “master disks” and packaged software, this is even more amazing.

Nevertheless, with dazzling speed comes another set of problems. All code can be pushed anytime. However, many times you do not want everyone to have access to the code – you want to run a canary release on actual users, not just machines. You might want QA to try your code in production, instead of a test server with partial data. If you’re a SaaS product, you might want your best customers to get access first to get their feedback. For call center software, you want to have an opportunity to test in a few call centers. You might want to have a marketing push in a certain country days (or weeks or months) after another country. You might want to fine-tune the feature with some power users, or see how new users react to a complicated use case. All of these scenarios can not be done at a server level. This is where feature flags come in. By feature flagging, you can gate off a code path, deploy using Spinnaker, and then use a feature flag to control actual access.

Together, Spinnaker and feature flagging make an amazing combination. You can quickly get code to “production”, and from there decide who gets it, when.

14 Mar

My Agile Launch

Starting at a company that helps software teams release faster with less risk has reminded me of my first foray into agile development.

One of my earliest professional experiences was as an intern at HBO, where I reported directly to the VP of Emerging Technologies.  Over the course of the summer, it became clear that there were more promising new technologies to explore than there were engineers.  The undaunted college student that I was, I convinced my boss that I should own one of these projects.  Every day I would demo my work on a screen in his office, and he would provide feedback.  A few weeks later the VP presented to senior management and the company officially green-lit the project.  Though we didn’t refer to it as such, that cycle was my introduction to Agile.  Yet more importantly, it was a daily routine where a non-technical business stakeholder provided direct feedback to a technical resource; an arrangement that I’ve come to realize is quite uncommon.

Working at large and small companies in a variety of engineering roles, there are two overall trends that I’ve identified:

  1. Companies often separate engineering from the business side.
  2. Most development-related failures (e.g. missed deadline, bad or buggy feature, release causing a security vulnerability, etc.) are a result of miscommunication.

Neither of these statements are profound in isolation but when coupled genuinely pique my curiosity.

Over time the broader engineering community has developed numerous tools and processes to mitigate risk.  Missing deadlines?  Use story points to measure team output (velocity) over time.  Releasing buggy features?  Try test-driven development.  Want to avoid downtime during a release?  Setup application performance monitoring in your staging environment.  While these “quality assurance” measures are not guarantees of perfection, they make development more predictable.

Problems arise when we cut corners in response to misaligned expectations.  Let’s say there’s a feature request for a relatively straightforward user enhancement.  The development team has done everything right to this point (features properly designed and scoped), but towards the end of the sprint the team finds a scalability issue with no clear path forward.  Engineering management notifies the business side of this issue and insists that there should be resolution within five business days.  Five days pass and the developers have made no progress.  Engineering rushes to fix the issue through a refactor but skips unit testing to release earlier.  Two new bugs slip into production.  Instead of working on the next sprint the development team now works to get a hotfix release out.  One unexpected event can change everything.

The separation of the business side from its engineering counterpart sets the stage for frustration and missed opportunities.  Any tool that can bridge the gap between these two groups offers immense value to an organization or a working relationship.  Cucumber, a Behavior-Driven Development test tool, empowers non-technical stakeholders to define requirements, in plain English, that double as executable test guidelines for engineers.  By regularly reviewing Cucumber test results, a business stakeholder could easily assess the current status of a project.  Nevertheless, Cucumber facilitates one-way communication and offers no clear guidance on iteration or state.

The daily iterations I had at HBO were extremely effective in moving the project forward quickly. In a perfect world we could repeat this process as often as possible with senior management to win their approval as early as possible.  What if we schedule a daily meeting with the entire senior management team and show up with a buggy build?  It would quickly become apparent that our good intentions are far less valuable than executive time.  Instead, what if we gave each one of the senior managers a version of the technology that we could push updates to over time?  What if I could focus on building features and my boss could choose and deploy the ones that he thought were ready?  What if after realizing that there was a problem with a deployed feature he could hide that feature without involving me?  LaunchDarkly does all of this at the enterprise level.

To me LaunchDarkly is about much more than feature flagging or even quality assurance; the platform empowers companies to reconnect the technical and non-technical departments in order to shorten feedback cycles with customers and make better decisions.

This mission is a game changer.

That its founders and team are all extraordinary yet humble makes the opportunity twice as appealing.

04 Jan

Running Usability Tests in Production

Usability testing in a real-world environment (aka production environment) gives us insight into how users actually use our product in their day to day lives. It is one thing to run a test in a lab setting, but it is another to have users try features while they are walking, running to the airport, stressed, and sleepy.

But, no matter how hard we try, it is very difficult to truly simulate how our apps behave in our production environment. We can run focus groups, beta tests on a beta environment, and test things internally, but how can we truly mimic the real world in an artificial context? In other words, how do we test a feature while also simulating the environment it’s meant to be used in?

LaunchDarkly Usability Testing In Production - Feature Flags and Feature Toggles - Context

A good real-world usability test analyzes features efficiently and accurately collects user feedback to improve the user experience. But, when we test anything in a non-production environment, we are inherently biasing our tests. In a lab-based usability test:

  1. Users are overly cognizant of the feature they are testing.
  2. Users are unnaturally focused on testing that new feature.
  3. Users are using fake data or incomplete data, and don’t sufficiently utilize actual use cases.
  4. It is very hard to test discoverability (i.e can a user find the feature on their own?).
  5. Users tend to ignore distractions, like external notifications (Facebook, Skype, texts) and just focus on the task at hand.
  6. Users are in an overly analytical mode, typically looking to give feedback.
  7. Users are overly forgiving, looking to please!

Does this mean that running usability tests in non-production environments is a waste of time? Not at all! This is absolutely necessary to identify bugs, check for general usability, and solicit feedback quickly.

However, it should just be one of the steps in a comprehensive usability testing process, one that involves internal, staging, and production usability testing.

Benefits of usability testing in production:

The primary purpose of usability testing in production is to gather real-world user behavior while minimizing bias and performance risk. Some more benefits include:

  1. Genuine user feedback in a real-world environment
    • Quantitative insight into your feature’s performance. How well is it scaling? Are users using it? How is it impacting your system? How are your levels of engagement?
    • Contextual insight into your feature’s efficacy. Do users see the new feature? How is it meshing within the context on your existing feature set? Are people using it as intended? Are people using it once and then not using it again?
    • Qualitative feedback. Are users complaining? Are they happy? Are they neutral?
  2. No opt-in bias – users test the feature without knowing they are part of the test. You can assess how well they use it by using a product like Full Story to record the session or by tracking metrics. You therefore get a more representative sample testing the feature, rather than just early adopters.
  3. Measuring actual system performance – there is nothing quite like your production environment, where you can have a complex array of nodes, clusters, CDNs, etc allowing your app to scale. As people start to use the new feature, you can see how it is impacting your actual system (load times, discoverability, caching issues).

Managing a usability test in production:

Of course, testing anything in your production environment is inherently risky and has real-world consequences. If you launch something to everyone just to get feedback and they hate it, then you risk permanently losing those users. Equally bad, you can cripple your entire application with unforeseen scaling and performance issues.

To mitigate this, companies like Facebook, Amazon, and Google collect production feedback by releasing features behind feature flags. While we won’t go into to the specific anatomy of a feature flag, we can go through the methodology behind the release.

LaunchDarkly - Usability Testing Using Feature Flags and Toggles - Betas and Feedback

If a feature is wrapped in a feature flag, then it gives you control over who sees the feature and when. This means that you can perform targeted and controlled releases using a percentage rollout, whereby you can incrementally increase a feature’s visibility to 1%, 5%, and to 100% of your users.

Hence, you can collect production level feedback because you control the level of risk. If a new feature is performing well, then you can keep increasing the percentage rollout. If it is tanking or hurting performance, you can reduce the rollout or kill it completely.

Therefore, feature flags (aka feature toggles) give you full control over the risk of your production releases. You can gather real-world user feedback by separating your feature rollout from your code deployment.

14 Oct

The Future of Feature Flags: Managing Dynamic Applications

LaunchDarkly Multivariate Feature Flags / Toggles

Traditionally, software teams have used feature flags/toggles to control simple rollouts and enable kill switches.  Boolean values were used for “on” or “off”, “true” or “false”.  With the introduction of multivariate flags, developers have been experimenting with serving rich values via these flags: strings, numbers, JSON objects, and JSON arrays. This has opened up a whole new world of dynamic application management.

Use Cases

Using feature flags to manage dynamic applications opens up many powerful and interesting use cases, for example:

  • Manage features in a pricing plan
  • Customize pricing based on user segment Apply coupons and discounts based on user actions or holiday sales
  • Allow users to personalize their own experiences
  • Manage CSS styling to test colors, layouts, and elements
  • Control conditional logic separately from your code base

The Multivariate Flag

A multivariate feature flag has two primary benefits: you can customize the number and type of variations returned.

LaunchDarkly Multivariate Feature Flags / Toggles

Let’s first look at a simple example. This feature flag calls the variation method to determine which variation the user should get for a “checkout.flow” feature flag.

Here, the user “bob@example.com” will either see a one-click, two-click, or original checkout based on the string value returned by the feature flag:

Serving Dynamic Values

Now, let’s use a feature flag to insert prices into a pricing page based on some attributes of a the user. We’ll call this feature flag “plan.price”.

LaunchDarkly Managing Dynamic Content with Feature Flags / Toggles

Here, the user will receive a price of $20, $50, or $75 depending on whether they match a particular targeting rule.

For a more advanced use case, these values (20, 50, 75) could be used to populate other pricing dependencies, like a customer invoice and account limits.

Best Practices

Of course, you should not use feature flags as a secondary database, but they could be useful as a way to personalize feature targeting without tying that logic into your codebase. This allows you to rapidly change pricing models, test color schemes, and configure complex features without having to redeploy.

07 Sep

Why microservices need feature flags

Microservices is the practice of breaking up a huge, monolithic release where all components are tested and released as a whole, into many discrete services that can go on independent release schedules. The benefits of microservices were popularized by Martin Fowler, and put into practice by Amazon and Netflix. However, with microservices, releases become more complicated, by n-factorial. Instead of one monolith that can tested as a whole, if there are even as few as nine different services, each needs to be tested with every other component. Suddenly there are nine potential friction points.

Feature flags to the rescue! With feature flags, engineering teams can have complete control over their various microservices. First, wrap the microservice with a feature flag, with all traffic going to the old version. Then, release a new version. With a feature flag, gradually put whatever traffic you want to the new version, verifying functionality in all the other micro services. The new traffic and be cordoned off however suits your business. A feature flag at it’s simplest can be an “on” “off” switch to quickly flip between versions, similar to a blue green deployment. However, feature flags can serve arbitrarily complex (or simple) variations of traffic to the new microservice. Some example of how to test the new micro service are
– percentage rollout
– by IP address of service
– by version number of other services
or whatever other information. Feature flags allow users to do “microdeployments” of microservices. A microdeployment is breaking a deployment into smaller components of who exactly receives the new service.

Using feature flags and microservices together allows engineering teams to truly unlock the value of decoupling. You can read more about how we use microservices and feature flags at LaunchDarkly here.




26 Aug

3 Ways to Avoid Technical Debt when Feature Flagging

Feature flags are a valuable technique of separating out release (deployment) from visibility. Feature flags allow a software organization to turn features on and off at a high level, as well as segment out their base to allow different users different levels of access. However, feature flags have an (ill-deserved) reputation of “Technical Debt”. Used incorrectly, feature flags can accumulate, add complexity, and even break your system. Used correctly, feature flags can help you move faster. Here’s three easy ways you can avoid technical debt when using feature flags.

  1. Create a central repository for feature flags

Using config files for feature flags is “the junk drawer” of technical debt. If you have seven config files with different flags for different parts of the system, it’s hard to know what flags exist, or how they interact. Have one place where you manage all of your feature flags.

  1. Avoid ambiguously named flags

Give your flags easy to understand, intuitive names. Assume that someone other than you and your flag could potentially be using this flag days, months, and years into the future. Don’t have a name that could cause someone to turn it on when they mean off, or vice versa. For example “FilterUser”, when it’s off – does this mean users are filtered? or not?

  1. Have a plan for flag removal

Some flags are meant for permanent control, for example for an entitlements system. Other flags are temporary, meant for the purpose of a release only. If a flag can be removed (because it’s serving 100% or 0% of traffic), it should be removed, as quickly as possible. To enforce this rigor, when you write the flag, also write the pull request to remove it. That way, when it’s time to remove the flag, it’s a two second task.