08 Sep 2017

Saving Private Instances

You are a new Director of Engineering at an enterprise company. The company has just moved your product to the cloud and is hosting in Microsoft Azure. You come from a tech startup where you ran all of your software in the cloud and focused on building product. So naturally you have just implemented instrumented monitoring with HoneyComb, Kubernetes to manage your containers, and you’re thinking about leveraging serverless architecture.

You moved to this established enterprise company because they are providing technology to the healthcare industry and you are passionate about this space. They want you because you are good.

Now you face a challenge—how can I bring the good things from my past role and pair them with the security standards and compliance in my new role? Well, let’s first think about what the good things from your past role entails. Your previous team:

  • Deployed 5 times a day
  • Used feature branching
  • Feature flag first development
  • Implemented continuous integration

You realize you used 3rd party software for most of these good things, and had homegrown for others. You decide your first project is to research what software the new team is using, has built, and what software you will need to build and buy.

Security and the cloud.

Now, before we dive in, let’s consider the security standards and compliance you need to think about in your new role:

 

  • Where is my data stored?
  • What data is being sent over to the third party—is it PII?
  • Where connections are made to third parties?
  • How can I set different access control?
  • How it will work when connection fails-redundancy?
  • Is it HIPAA compliant?
  • How is all this audited?

 

Like your new company, many companies are just starting to move key operations to the cloud. This is SCARY. Moving into Azure will allow your company to free up floor space, add more server redundancy, easily scale up and down, only pay for what you use, and focus resources on revenue generating activities— security is still your responsibility.  And though these are all great things, you are not a security company.

Now you are tasked with bringing in these good processes, and they include 3rd party software providers. Products in this space are inherent in your development lifecycle and very close to your core application, but you need to ensure they are secure.

As we all know, most software providers in this space are cool tech companies in The Valley, far and isolated from the realities of your secure enterprise world. You look at on-premise options. And you remember that you’re driving change in the Healthcare space, not looking to manage infrastructure. The cloud options require a significant audit that is time consuming. A SaaS provider carries an ongoing risk that requires you store some data on 3rd party servers, which is a non-starter in Healthcare.

Is there a middle ground?

There is one more option many software providers offer: Private SaaS, also known as a managed instance, Dedicated VPC, or private instance. You surmise if they do not have on-prem, private SaaS, or a really really good security team, then it’s not likely their team is accustomed to working on enterprise challenges.

What exactly is a private SaaS offering? This refers to a dedicated single tenant, cloud software where the vendor manages the infrastructure.

Why choose a private instance?

  • Single Tenant—You will have a dedicated set of infrastructure contained within a VPC. This eliminates the risk of noisy neighbors.
  • Data Storage—Data can be stored in your AWS account or in an isolated section of the vendor’s AWS account. This allows flexibility, if you are in the EU, vendor could spin up the instance in AWS in the EU.
  • Residency—If you have a preference on where the instance is located or need to ensure proximity.
  • Compliance—If you have additional security or compliance regulations, a private instance can be customized to fit your needs.
  • Change Cadence—If you need to know when the software will be updated, you will have better insight and greater flexibility with a private instance.
  • Integrations—If you have custom tools, integrations can be built.

What’s next?

You have narrowed down a few software providers for each job you are trying to solve (continuous integration, branching, feature flagging). You understand the value, the costs, and the security implications. You have chosen to stay in the cloud, which makes your infrastructure team happy. You have chosen to use private instances and SaaS where it makes sense, which makes your security team happy. And you have the tools you need to bring the good things into your new role, so you are happy.

Now you can focus on helping your company deliver product faster, eliminate risk in your release process, speed up your product feedback loop, and do it all securely.

28 Aug 2017

All the Pretty Ponies

August is always full of security awareness in the wake of DefCon, BlackHat USA and their associated security conference satellites. Las Vegas fills with people excited about digital and physical lockpicking, breakout talks feature nightmare-inducing security vulnerabilities, and trivially simple vote machine hacking. The ultimate in backhanded awards are given out to companies and organizations who made the world less secure, usually because of an overlooked flaw.

The Pwnie Awards (pronounced pony) are the security industry recognizing and mocking organizations who have failed to protect their data and their users. There are categories for:

  • Server-side Bug
  • Client-side Bug
  • Privilege Escalation Bug
  • Cryptographic Attack
  • Best Backdoor
  • Best Branding
  • Most Epic Achievement
  • Most Innovative Research
  • Lamest Vendor Response
  • Most Over-hyped Bug
  • Most Epic Fail
  • Most Epic 0wnage
  • Lifetime Achievement Award

And last but not least:

  • Best Song

This year’s best song is a parody of Adele’s HelloHello From The Other Side is complete with a demonstration of the exploit and a lyrical summation of what they’re doing.

Across the industry, security vulnerabilities are given a tracking number (Common Vulnerabilities and Exposures or CVE) and described in a semi-standard system. This helps everyone understand which category they fall into  so they can read and understand vulnerabilities that may be outside their area of expertise. This also helps us talk about vulnerabilities without resorting to sensational names like “Heartbleed”.

Since it is August and the 2017 awards have been announced, I got to wondering whether any of the Pwnie-winning vulnerabilities could have been prevented by feature flags. There are several that could possibly have been mitigated by the ability to turn a feature off, but I chose this one:

Pwnie for Epic 0wnage
0wnage, measured in owws, can be delivered in mass quantities to a single organization or distributed across the wider Internet population. The Epic 0wnage award goes to the hackers responsible for delivering the most damaging, widely publicized, or hilarious 0wnage. This award can also be awarded to the researcher responsible for disclosing the vulnerability or exploit that resulted in delivering the most owws across the Internet.

WannaCry
Credit: North Korea(?)

Shutting down German train systems and infrastructure was Child’s play for WannaCry. Take a legacy bug that has patches available, a leaked (“NSA”) 0day that exploits said bug, and let it loose by a country whose offensive cyber units are tasked with bringing in their own revenue to support themselves and yes, we all do wanna cry.

An Internet work that makes the worms of the late 1990s and early 2000s blush has it all: ransomware, nation state actors, un-patched MS Windows systems, and copy-cat follow on worms Are you not entertained?!?!?

WannaCry was especially interesting because it got “sinkholed” by a security researcher called MalwareTech. (I know, he’s under indictment. Security researchers are interesting people.) He noticed that the ransomware was calling out to a domain that wasn’t actually registered, so he registered the domain himself. That gave lots of people time to patch their systems instead of getting infected.

As an industry, we tend to think of server uptime as a good thing. But is it? Not for any single server, because if it’s been up for a thousand days, it also hasn’t been rebooted for patches in over three years. You may remember early 2017 when Amazon S3 went down? That was also due to a restart on a system that no one had rebooted in ages.

So how can feature flags help keep production servers current?

  • It’s safer to make a change in production if you know you can revert it instantly using a feature flag kill switch.
  • Small, iterative development means that you can ship changes more quickly with less risk—the more often you reboot a server, the less important that particular server’s uptime is.
  • Many infrastructures today are built with the concept of ‘infrastructure as code’ through the use of automation tooling. These automation configurations can also benefit from the use of feature flags to roll-out system or component versions alongside your application updates.

Second we have:

Pwnie for Best Cryptographic Attack (new for 2016!)
Awarded to the researchers who discovered the most impactful cryptographic attack against real-world systems, protocols, or algorithms. This isn’t some academic conference where we care about theoretical minutiae in obscure algorithms, this category requires actual pwnage.

The first collision for full SHA-1
Credit: Marc Stevens, Elie Bursztein, Pierre Karpman, Ange Albertini, Yarik Markov

The SHAttered attack team generated the first known collision for full SHA-1. The team produced two PDF documents that were different that produced the same SHA-1 hash. The techniques used to do this led to an a 100k speed increase over the brute force attack that relies on the birthday paradox, making this attack practical by a reasonably (Valasek-rich?) well funded adversary. A practical collision like this, moves folks still relying on a deprecated protocol to action.

SHA-1 was a cryptographic standard for years, and you can still select it as the encryption for a lot of software. This exploit makes it clear that we need to stop letting anyone use it—and for their own good.

How a feature flag might have made this better:

  • Turn off the ability to select SHA-1 as an encryption option
  • Force current SHA-1 users to choose a new encryption option

Some feature flags are intended to be short-term and will be removed from the code once the feature is fully incorporated. Others exist longer-term, as a way to segment off possible dangerous sections of code that may need to be removed in a hurry. Feature flags could be combined with detection, like I suggested for the SHA-1 problem, and used to drive updates and vulnerability analysis.

It’s easy for us to think of deployment as something a long way from security exploits that involve physical access to computers and networks, but security exploits are “moving left” at the same time as all the rest of our technology. Fewer of us manage physical servers, and fewer security vulnerabilities relate to inserting unknown USB sticks into our laptops. Instead, we’re moving to virtual machines and containers, and so are our vulnerabilities. Constructing code, cookbooks, and scripts that allow us to change the path of execution after deployment gives us more options for stay far, far away from the bright light of the Pwnie Awards.

21 Aug 2017

How to Comply

Everyone loves a little affirmation (Image credit: twitter)

Things we wish we had known and things we were happy that we implemented early. 

At LaunchDarkly we recently embarked on the journey to SOC Type 2 compliance. While the reasons we chose to pursue this certification were primarily business driven, the tasks and actions incorporated into the process most directly impacted our engineering operations and development teams. And due to the experience and philosophy of the founding engineering team, the actually impact of the process was minimal.

Once deciding to sally forth on SOC certification (or any security and compliance certification for that matter) you will be in a much better position if you view the criteria of the certification as a benefit (or imperative) of good business practices. If you are in search of the certification primarily as a way to sell into certain customers or verticals, you will likely be frustrated with this entire process. Security is like diversity—you need to inherently believe in the value to be successful in implementations and outcome.

“So, what are these criteria you speak of?”

Need vs. want

The principle of least privilege is a well-known concept where you only provide access to the people that actually need it. This also extends to the level of access that is given. Ultimately this distills to: view vs control. I like to start here because this is a forcing function for so many of the other criteria. When you are thinking about who should have access to each system/service, you are inclined to:

  • Define roles
  • Log activity based on user/account
  • Build review process for accounts

On-boarding each new employee gives you an opportunity to review the tools you use to run your business, question who needs access, and to what extent.

This is also a great time to introduce new employees to another key security maxim: Trust, but verify. This concept is relevant in the context of the access that accounts or component services assert are needed, as well as the way that you validate user accounts.

For services—either 3rd party or just components of a larger service—you should know what information is being stored, and where. Ideally, you are being thoughtful of the first principle of least privilege.

This is especially true of customer data. The easiest way to protect data is to limit what you collect. Second is to make sure that you are being intentional and explicit with where you store the data. Finally, you need to look at the access policy that you put around that data.

For user accounts, multi-factor authentication (MFA) or two factor authentication (2FA) significantly increases your ability to validate that accounts are only being accessed by account owners.

Another time to plan for is employee off-boarding—ideally, before you off-board your first employee. This is also a good time to review your tools and access privileges.

Context; not blame

When a single developer is working on building a service, logs are primarily useful for understanding how well things are working (or where they broke). As the number of individuals working on the system becomes larger, the ability to know who changes what becomes increasingly important.

One caveat is that as you incorporate the ability to know who, you leverage this data to build context around why things changed, rather than simply using it as a means to place blame. If one person can bring down your service, then you probably should direct blame at the architect of the system. (Unless that destructive individual and the architect are the same person ¯\_(ツ)_/¯).

“A log without account context is like a novel without characters.”

A log without account context is like a novel without characters. You can build a picture of what happened, but likely will miss why things happened. If you don’t know why, you’re unlikely to prevent it from happening again.

Built for toddlers… or failure

Failure is trivial compared to proactive destruction (image credit: Lego City)

The stability of a service is often a strong indicator of the inherent security. After all, the most common exploitations are based on overloading some resource. Thinking further about the elimination of blame from building a secure and stable service, failures should be viewed as opportunities for increased robustness. This is where building for toddlers comes in.

Toddlers are the ultimate destroyers. It is the developmental stage where everyone starts to experimentally test the laws of physics. Gravity, entropy, projectile motion, harmonic oscillation—they’re all put through a battery of tests.

Ideally, you are thinking about your service from the perspective of a parent (or guardian) that is toddler-proofing their home. Bolt things down, put breakable things out of reach, lock up the flamethrower, and embrace the fact that you will miss things. For the items you miss, have an emergency procedure in place and appropriate medical supplies on hand.

Back in the context of your service…

Write it down… or it never happened

Your code/feature is not ‘done’ until the docs are written or updated. Services require constant supervision and are never ‘done’. But if a developer builds or changes something and doesn’t write it down, then they effectively become the only individual that is able to monitor or operate the entire service (at least with all the context).

So, what should you write down? “Everything,” is the easy answer, but often not the complete one. You want to write down enough to provide context if any component starts doing something unexpected.

If you don’t know where to start, a good approach is to write down what would need to happen if your service was deleted. How would you rebuild and restart your service? How would you restore your data?

Next you can look at the impact/process of the loss of each individual component service. The important part of this is to incorporate the documentation into the development process to ensure that as your service evolves your documentation is always up to date—otherwise, the change doesn’t exist after a failure.

Great, now you know what to do next time AWS S3 needs to reboot. But, what about your customers? The next step is to write down an action plan for service interruption. Make sure you have a process and plan in place for keeping folks in the loop.

Security is not the french fries

If you are in a situation where you are looking to “add” security, you are likely going to be in good company with Sisyphus. Security needs to be a part of your foundation—it is not an “add-on”. But if you are realizing this is now—that security is a requirement for your business— you can do more than wish you had considered it sooner. It is not too late, but it is not a quick fix that you can solve with a certification.

First you need to implement the security in your foundation and process. Make sure that it is part of your culture. Once you have a culture of security and process, compliance is just providing proof of your culture.

Now… about that certificate

You don’t show up to your official Genius Book of World Records judging day having never practiced juggling 9 clubs. Same goes for when you decide to get your certification for SOC.

However, when you build a strong foundation and culture for security and compliance, then the steps to get certification are rather straightforward. You call up your friendly neighborhood SOC auditor and get a copy of their check list.

In the case of LaunchDarkly we worked with the fine folks at A-lign. After an initial conversation we retained their services to conduct an independent audit of our systems and practices.

A few months prior to the audit, A-Lign provided our team with a checklist of all the documentation and proof they would need to see when they came onsite for our assessment. This afforded us the opportunity to ensure that all of our practices were organized and in a state that could be easily evaluated.

When the time came for the audit the auditor spent three days* on-site interviewing members of the team and reviewing our practices. After the on-site visit, we were informed that we had passed the initial competence certification.

Of course, now that we have gone through the validation process for one certification, it seems like a good time to keep going for a few more. Many of the certifications have a significant overlap in requirements. They are all looking to establish trust and ensure the service provider is operating in the best interest of the customer. And it turns out, most customers define trust in a very similar way.

25 May 2017

Launched: Single sign-on

Spend some time at a software shop, and you’ll inevitably collect a pile of accounts for services, internal and external. Since you value security, each of your passwords are long and unique and safeguarded in a password manager. You imagine a world where you don’t need to manage passwords for each and every service you use.

That’s why we are excited to announce support for single sign-on via the industry-standard Security Assertion Markup Language 2.0 (SAML 2.0). Knowing that SAML integrations can be cumbersome and complicated, we refined the administrator experience to be simple and clear. We built a test-drive mode so administrators can verify their SAML configuration end-to-end before enabling single sign-on in LaunchDarkly for the entire team.

Our single sign-on implementation is accompanied by a couple other benefits. With LaunchDarkly’s just-in-time user provisioning, administrators can onboard new employees from their identity provider without having to also create accounts for them in LaunchDarkly. Simply grant the new employee access to LaunchDarkly via your identity provider. Then LaunchDarkly will automatically create a new account when the member visits LaunchDarkly for the first time. Additionally, any changes to the member’s profile or assigned roles will be propagated from your identity provider as soon as the member signs into LaunchDarkly.

We currently support Okta and OneLogin, with support for additional identity providers on the way.

Single sign-on is available to customers on our enterprise plans. If you’re interested in learning more about our enterprise plans, contact sales@launchdarkly.com.

Behind the curtain

Alexis and I collaborated on the single sign-on feature. The very first step we took was creating a feature flag for SSO in LaunchDarkly. With our feature flag seatbelt on, we didn’t need to maintain a long-running branch for the feature, which meant we thankfully didn’t have to suffer from massive merge conflicts. Every optional change that could be hidden behind that feature flag could be released incrementally and without extensive manual QA review.

When we demonstrated the feature in progress to a customer, we didn’t need to use a staging system; we could demo on production because the feature was hidden behind a feature flag. When we were ready for the feature to be beta-tested, it was very easy to enable it for one customer and then another. The SSO feature flag remains today, and now our sales team uses the flag to enable the feature for their customers.

02 May 2017

To Be Continuous: When Toasters Broke The Internet

In the latest episode of To Be Continuous, Edith and Paul discuss the massive DDoS attack against Dyn, a major DNS provider, in late 2016. They examine the security flaw that enabled millions of IoT devices to be hijacked and discuss how it could have been avoided.

They conclude that to minimize such attacks, all devices that connect to the internet should have easily updatable firmware, and that the IoT industry needs greater regulation. This is episode #31 in the To Be Continuous podcast series all about continuous delivery and software development.

Continue reading “To Be Continuous: When Toasters Broke The Internet” »

04 Apr 2017

We got your RBAC

How LaunchDarkly gives teams granular access and security control for their feature flag management

Enterprise companies take security and privacy very seriously: risk must be mitigated, customer privacy must be protected, and software releases must be controlled.  Feature flags are essential tools to granularly control software releases, but with great power comes great responsibility.  When you have hundreds of stakeholders using a product, you need to make sure that every team member has the exact permissions they need: no more, no less.

Powerful tools demand powerful access controls.  This means that your demoing account executive should not be able to toggle off live production features unless they are explicitly allowed to enable for a customer.  Likewise, your developers should be able to use feature flags in their own environments, but might not have access to disable functionality that customers depend on.

Custom Roles

To make this a reality, LaunchDarkly has built an extremely powerful and granular access control system that we call custom roles.

Custom roles let you control access for every team member and every feature in LaunchDarkly, from a particular flag’s percentage rollout to the ability to toggle a flag on or off.  You can create a role using our custom roles builder.

Here are some possible custom roles:

  • Lock your production environment down to a small set of trusted users
  • Distinguish infrastructure-level feature flags (controlled by your devOps team) from experiments (controlled by product management or marketing)
  • Allow QA members to control feature flags on designated QA environments only
  • Allow your designers to add users to betas
  • Allow sales to turn a feature “on” for a user

Security

Equally essential for security is the ability to prevent nefarious access and brute force attacks.  Companies want to make sure that the platform controlling their feature releases conforms to security best practices.

As such, LaunchDarkly provides multi-factor authentication and session control for all customers.  Multi-factor authentication (MFA) improves the security of your account by requiring a second verification step in addition to your password to login. In LaunchDarkly, you can enable multi-factor authentication for your team’s account, which requires you to enter a verification passcode from a free authenticator application you install on your mobile device.  You can also require all team members to enable MFA before accessing their accounts.

Moreover, LaunchDarkly’s session control offers administrators a set of controls to manage how long users stay logged in to their account, and how often they need to re-authenticate.  This allows admins to take proactive measures when an account is compromised or a laptop is lost, providing full control over LaunchDarkly account access.

Summary

Feature flagging is increasingly becoming central to a company’s software development and release lifecycles.  As part of a company’s critical infrastructure, feature flag management platforms must have enterprise-grade security to ensure that customer data is safe and that every team member has the exact access they need.