16 Mar 2018

Hypothesis Driven Development for Software Engineers

Last week I attended the QCon London conference from Monday to Wednesday. It was a thoroughly interesting and informative three days. The sessions I heard ranged from microservice architectures to Chaos Engineering, and from how to retain the best people to new features of the Windows console. But there was one talk that really stood out to me—it took a hard look at whether we are Software Developers or Software Engineers.

QCon London is a conference for senior software engineers and architects on the patterns, practices, and use cases leveraged by the world’s most innovative software shops.”

QCon London describes itself as a software conference, not necessarily a developer conference. It focuses more on the practices of creating software as opposed to showing off the latest and greatest frameworks and languages, and how to work with them. This came through in the talks I attended where most showed very little code and focused on how we work as teams, how we can interact with tools, and the idea that how we treat our code can have a huge impact on what we ultimately deliver.

With that in mind I went to a talk titled “Taking Back ‘Software Engineering’” by Dave Farley. I wanted to understand the differences he sees between being a software developer and an engineer, and learn how those differences can help create better code. During his 50 minute presentation, Farley outlined three main phases of production we have gone through. The first was craft, where one person builds one thing. The next was mass production, which produced a lot of things but wastefully resulted in stockpiles of products that weren’t necessarily used. The final type of production was lean mass production and Just In Time (JIT) production. This is the most common form of production today and is possible because of tried and tested methodologies ensuring high quality and efficient production lines. JIT production requires a good application of the Scientific Method to be applied to enable incremental and iterative improvements that result in a high-quality product at the end.

Without this JIT production approach and the Scientific Method, Farley pointed out that NASA would never have taken humans to the moon and back. It is only through robust and repeatable experiments that NASA could understand the physics and engineering required to build a rocket that could enter Earth’s orbit, then the moon’s, land humans on the moon, and then bring them back to Earth. It’s worth noting that NASA achieved this feat within 10 years of when President John F. Kennedy declared the US would do it—at which point NASA had not yet even launched an orbital rocket successfully.

Farley surmised that engineering and the application of the Scientific Method has led to some of humanity’s greatest achievements, and yet when it comes to software there is a tendency to shy away from the title of “Engineering”. For many the title brings with it ideas of strict regulations and standards that hinder or slow creativity and progress rather than enable them. To demonstrate his point Farley asked the audience, “How many of you studied Computer Science at university?” Most of the room raised their hand. He followed up with, “How many of you were taught the Scientific Method when studying Computer Science?” There were now very few hands up.

Without a scientific approach to software development it’s perhaps an industry that follows a craft-like approach to production where because it works, it’s good enough. For example, if a software specification was given to several organisations to build, one could expect widely different levels of quality with the product. However, the same cannot be said of giving a specification for a building, car or rocket to be built—they could appear different but would be quality products based on rigorous tests and standards.

Farley went on to talk about how Test-Driven Development and Continuous Delivery are great at moving the software industry to be more scientific and rigorous in its testing standards. Though they are helping the industry to be better at engineering, there is perhaps another step needed—Hypothesis-Driven Development (HDD)—to truly move the industry to being one of engineers instead of developers.

Through HDD, theories would be created with expected outcomes before the software development aspect was even considered. This allows some robust testing to do be done further down the line, if the hypothesis stands up to the testing then it can be concluded that this appears to be correct. Further testing of the same hypothesis could be done too, allowing for repeatable tests that demonstrate the theory to be correct. The theories could be approached on a highly iterative basis, following a MVP like approach, if at any point the theory no longer holds up then the work on that feature could be stopped.

The theories wouldn’t need to come from developers and engineers themselves, although they could, but could come from other aspects of the business and stakeholders who request work to be done on the products being built. This would result in more accountability for what is being requested with a clear expectation around the success criteria.

Whilst I and my colleagues apply some of these aspects to the way we work, we don’t do everything and don’t approach working with software with such a scientific view. I can see clear benefits to using the Scientific Method when working with software. When I think about how we might better adopt this way of working I am drawn to LaunchDarkly.

We use LaunchDarkly at work for feature rollouts, changing user journeys and for A/B testing. The ease and speed of use make it a great tool for running experiments, both big and small. When I think about how we could be highly iterative with running experiments to test a hypothesis, LaunchDarkly would be an excellent way to control that test. A feature flag could be set up for a very small test with strict targeting rules, and if the results match or exceed the hypothesis then the targeting could be expanded. However, if the results are not matching what was expected, then the flag could be turned off. This approach allows for small changes to be made, with minimal amount of time and effort being spent, but for useful for results to be collected before any major investment was made into developing a feature.

I found Farley’s talk at QCon London to be an interesting and thought provoking look at how I could change the way I work with software. I’m now thinking about ways to be more scientific in how I approach my work, and I think LaunchDarkly will be a very useful tool when working with a Hypothesis Driven Development approach to software engineering.

23 Jul 2015

Hypothesis Driven Development: Yammer case study

How Yammer does hypothesis driven development, guest post by Ron Blanford, Yammer Product Manager

Recently I kicked off a project to overhaul to our iPhone publisher in order to make it easier for users to post photos to Yammer. We didn’t start this project with the intention of overhauling the entire publisher, but when we took a closer look at the overall experience, we knew we needed to make big changes.

We still maintain a lean startup mentality at Yammer, which means we develop a hypothesis and build the most minimal thing we can to test that hypothesis and validate our decisions with data incrementally. As you might imagine, overhauls that change many variables at once are not too common around here but sometimes we know they are necessary to drive the product forward. According to Mary Meeker’s Internet Trends 2014, 1.8 billion photos were being posted to social media sites on a daily basis. So we generally know that people are accustomed to taking pics from their phone and posting them to social media sites. People have pictures on their phone. We just weren’t making it easy for them to post those photos to Yammer.

Go Big or Go Home

Why was it necessary to overhaul the publisher? I didn’t need an analyst or a user researcher to tell me that the experience of posting photos to Yammer was terribly outdated. Just using the feature made it obvious that we hadn’t invested in this part of the app in years. You’d tap the camera icon, which would then prompt you to choose to take a new photo or upload an existing one, at which point you’d get dropped into your photo roll or the camera. If you wanted to post multiple photos, you’d have to go through the flow again, and again, and again. As a general rule, we want to minimize the opportunity for bad experiences in the product, but we’d also been hearing from our customers through our user researchers that they were having difficulty with the photo posting process. This is especially true for retailers, for example, who employ thousands of workers who don’t sit in front of a computer every day. These users rely primarily on the mobile experience to communicate with their coworkers, and sharing photos is an important use case.


My hypothesis for this project was that if we made it easier to post images, people would indeed post more images, and as a result, the number of days our users engage with Yammer would go up. Why? We know that posts with images are more engaging than those without. Posts with a photo get on average 17% more responses and nearly four times as many likes. Why are replies and likes important? Likes provide validation, acknowledgement, and support from the network. They encourage people to post more, which in turn encourage more replies, likes and eyeballs. It’s a nice reciprocal engagement loop that ultimately leads to more content on the network, more people having conversations, more people getting work done, more people discovering things, etc.


Build It

This was the easy part for me since the vast majority of my heavy lifting was done prior to any developers writing a line of code. From here on out, our publisher was mostly in the hands of the designer and developers.

What made this project different from so many others was that it required really close attention to what would otherwise be thought of as small details. Whereas transitions and animations are often afterthoughts to the core parts of an app or feature, in this case, they were core to our success. If the transition was jumpy or unnatural, people would find the new experience jarring and painful. If we didn’t nail the experience, people would get frustrated and find other ways to share their photos. Many hours were spent dealing with how the keyboard slid out, how the gallery slid in, how the full-size gallery took over the whole screen, and more. We’ve always had amazing talent at Yammer; in this project, I believe the skill of our designers and developers allowed us to deliver a product of exceedingly high quality in a very short period of time.

Test It

In general, we show an experiment to the fewest number of users possible because this allows us to get statistically significant data from the smallest pool of users. In the event the feature is a bad experience or just doesn’t test well, we will have disrupted fewer people than we otherwise would if we tested everything at 50/50. On mobile, however, because our usage is so much less than web, the smallest group of users is invariably 50%. So we ran this as a standard 50/50 A/B test.

Analyze the Results

Initial results were showing no significant effects. So we decided to give it a few more weeks to see if things would change, but they didn’t. Even after seven weeks, the results were disappointing: this was as flat as flat gets. Our core metrics — those we value most highly — didn’t move at all. These include days engaged, the number of people posting, the number of messages, new user retention, etc. In a lot of cases, the job of a Yammer PM is made more difficult when local metrics (metrics that tell us how people use something) go up, but core metrics are either flat or negative. In this case, our overhaul didn’t have any real impact on local metrics either. And that’s very disconcerting because it’s far easier to move local metrics than global metrics.

  • The number of people posting images didn’t go up.
  • The number of people people posting multiple images didn’t go up.
  • The number of posts with multiple images didn’t move. In short, we did not validate our hypothesis. Often when you read blogs about feature overhauls, they are either massive successes or massive failures. But these kind of results are the hardest when it comes to product tradeoffs, analysis, and vision.

Ship, iterate, or kill it

At the end of the day, we shipped this feature. But it was a difficult and long-debated decision. In the end, it came down to three things:
Without a doubt, we made it easier to attach multiple images.
We believe we created a better experience.
We refactored some very old code. Which actually made it a much easier decision. Even if this were not true, I would imagine that the first two bullet-points would have been compelling enough to base our decision on.

The obvious question is why didn’t this test well? For users, it’s about desire to post a photo rather than ease of posting. I believe our results were flat because people who really wanted or needed to post photos overcame the friction of doing so in the old experience. Making it easier to post photos apparently does not influence someone’s desire to post a photo. For that, we’d have to think about something that is much more top of the funnel. Expecting every feature to address both problems would be unrealistic. Overall, I think this project is a good example of being data-informed and not complete slaves to data.