Konmari is a new name for an old concept—we shouldn’t live with STUFF that isn’t serving a purpose in our lives. It’s championed by Marie Kondo, a woman passionate about throwing things away and organizing what’s left. Her breakout book was titled “The Life-Changing Magic of Tidying Up.”
Though Kondo’s focus leans heavily on physical things, I think her concepts are important for our digital lives as well. Because storage is effectively free, we never feel the pain of being hoarders, at least not directly. Never throwing things away is a personal choice when it’s about blurry pictures of our cats in our own folders, but it’s a different problem when it’s about our code base.
How do you know when you should remove flags from your code base? How do you clean up after all your experiments?
Build for Future Ease
The easiest way to avoid having big messes to clean up is to not make a mess in the first place. We all know that sounds simpler than it is, but I also believe that any improvement is a help. Just as it’s better to put your dirty clothes in a laundry hamper than on the floor, it’s better to tag new features with meaningful identifiers. Here are some best practices for future ease. You may already know them, but just in case:
If a feature is dependent on another function, mention that in the comments
Add dates about when something was added, don’t just depend on commit messages to indicate that
We have emotions about deleting things. We have to accept that at some point the feature (or picture or paragraph) will no longer serve a purpose. Since the things we delete always represent memory, potential, or work, we are often reluctant to delete them—we know that moment in time will never happen again or that work would have to be repeated.
The solution to this is not to avoid deleting things, but to avoid the emotional impact of making the decision. Automate the deletion of anything that you can. Meeting notices vanish when the meeting is past–deployment flags should evaporate when the feature is fully adopted. By configuring your code to delete outdated elements, you keep the clutter from accumulating in your codebase.
Consider being even more radical—if you have a policy of only supporting a certain number of versions, automatically roll the oldest version off support when you release a new version. If you make it a policy, everyone will know that they must upgrade when a new version is coming out.
Cultivate Tidy Habits
Wash on Monday. Iron on Tuesday. Bake on Wednesday…
This was an old expectation about when housework happened. Have some set times in your development cycle for cleaning up after yourself. What about that hour at the end of the workweek when you don’t want to start anything? Or the day you have mostly blocked off for sprint exit/entry? Those are good natural break points to go in and clean up the things you have been meaning to get to. When the cadence is regular, you’ll better remember what you were doing. Making cleanup a regular chore instead of a “when I get to it” behavior means it’s much less annoying when you do it.
Reward good behavior
Are you the sort of person who internally yells at yourself (or others) for making mistakes? Consider a new way of training yourself to do the right thing. Instead of being angry or frustrated about mistakes, catch yourself doing the right thing and reward it. Deleted some old code? Stand up and look at the sun, or have a hard candy, or trigger a micropayment to a fun account—whatever you find motivational. Associating good behavior with something you enjoy makes it easier to keep doing the good behavior. Even if you know what you OUGHT to do, most people don’t get a dopamine hit from fulfilling obligations.
In March we visited Atlassian’s new Mountain View offices for our monthly Test in Production Meetup. One of our own software engineers, Traci Lopez, opened the session with a talk on how helpful comments can be for getting visibility into why an event happened the way it did, especially when troubleshooting.
“Maybe you have a clear understanding of where to go, but if someone’s coming onto your team you get a lot more context of what the state of the world is and where you should go.”
Watch her talk below. If you’re interested in joining us at a future Meetup, you can sign up here.
I’m going to talk about comments and this new feature I worked on that we just rolled out and why we think it’s so great.
So, a little bit about myself. Like she said, I’m a software engineer at LaunchDarkly and we’re always looking for great people. If you know anyone that wants to work off development tools wants to work in Oakland, come talk to me.
Alright, so for those of you who don’t know, this is flipping the table. So, what happened last week was I received a notification in our change log Slack channel that someone had turned on the firehose flag for one of our customers that has a bunch of those events that Erin’s mentioning in the dev console. So, what this means is basically, we’re taking all of these events to third party and that can cost us a lot of money and be a big problem. But, what was great was the person who did this wrote a comment. So, “Hey, I turned on this flag to send all this data because I’m trying to troubleshoot something.” And this immediately gives us context into what’s going on, so I know who turned on the flag, why they turned on the flag importantly, and we were able to immediately respond to him and be like, “Hey wait, don’t do that. That’s a lot of data.” And he was able to immediately turn it off.
So, okay, I’m done troubleshooting. Great, so this happened within five seconds, you can kind of see the log up there. He could’ve provided a little more context here, like, “Hey, I collected over 30,000 events, got the data I needed.” But that’s kind of beside the point. What was great about this was it’s immediately giving us some context into what’s happening within our application, which is super useful for collaboration. So, I’m a big climber. There’s gonna be some climbing pictures coming through. But I think it just lends more into working as a team. You have somebody CLE climbing and you’re really trusting that person that’s belaying you.
And this comes with visibility. So, why did he turn on that feature plan. Alright, he’s trouble tubing something. So immediately, we know what’s happening and we also use compliments. It’s another way of keeping a good paper trail to understand why we’ve made all the different decisions to get to where we wanna go. And in this example, see all the different routes that could be taken and why we decided to take a certain route to get there?
So, another thing we have is the audit log, and within the audit log … so all of these changes are also captured within our change log channel in Slack, but we also have this audit log to see everything that’s been going on in application. So, for example this guy said “Flip it.” Not that useful of a comment. It’s like if I’m belaying somebody and they’re saying, “Pull, pull, pull” or something, these aren’t really useful commitments, either. We kind of have this understanding like, “Take” means, “Hey, pull up the slack.” And “slack” means “Give me more slack so I can clip my boot” or something.
So what might be a better comment is initial launch. So, this is Alexis, he’s our Front End Team Lead. So he’s turning on this new feature, rolling it out to everybody, and he’s like “Hey, this is our initial launch.” I’m like, “Great, you’re probably the person in charge of this feature, I know that we’re initially going south. If anything goes wrong or I have any comments, I know to talk to you most likely.”
And that’s not to say that he’s not delegating this to other people, because there’s a lot of people that worked on it, but with that responsibility of, “If something happens with that flag, I know who to talk to.” And maybe if he wasn’t the one even working on it, he’ll know who to point me to, in the right direction. And that is extremely useful when you’re working on a big team of people.
So, to get back to the new feature we rolled out, that I worked on, that we’re so excited about, is comments are the context to understand what’s been going on in your application. So, if I’m looking back for an extreme example is that firehose flag, I can understand “Why the heck did we turn on this flag?” “Oh, he was troubleshooting something and immediately afterwards, oh, this is done, resolved. I don’t have to think about this anymore.” I immediately know what was the problem and what was happening, and an easy example is “Hey, I’m rolling out this new feature. We’re so excited about it.” Great.
Comments can kind of be like a climbing ground, where you have an understanding. If you just show up to the climbing wall, like “Where should I go?” You see all these bolts and you can kinda get like a picture of where you’re trying to go, versus if you’re already on a team and you’re with everybody climbing, maybe you have a clear understanding of where to go, but if someone’s coming onto your team, you get a lot more context of what the state of the world is, and maybe where you should go.
So, another example I wanted to bring up, is we do AB testing, and often times if you have some hypothesis of how you think your test might go and that turns out to be really wrong, you’re like, “Wait, what happened? What changed?” So, I can go back into my audit log and see if somebody played with targeting goals. Like, maybe we were changing our testing purposes because of something, and that can all be documented right there. So, if I think the testing isn’t going how it should have gone, I can immediately see why that happened, who made those changes, so I can talk to them and be like, “Hey, I don’t think we’re supposed to do that.” Or “Maybe that’s what we were supposed to do,” et cetera.
And this also depends on what environment you’re in, so we use LaunchDarkly internally, and when I’m in my dev environment, I’m basically never putting comments in any flag updates in Twig, because I’m probably debugging locally, just turning things on and off, messing around, doesn’t matter. And staging, still not really writing comments. We have a change log for staging, but still not as much, whereas in production, every change we make is pretty clearly documented why we’re making changes. Besides that flippant comment, but we just rolled this out, so… we’re still kind of developing our new best practices on how to use this new functionality.
Guest post by Isaac Mosquera, Armory CTO and Co-Founder.
My co-founder DROdio likes to say“Business runs on relationships, and relationships run on feelings“. It’s easy to misjudge the unseen force of human emotions when changing the culture of your organization. Culture is the operating system of the company; it’s how people behave when leadership isn’t in the room.
The last time I was part of driving a cultural change at a large organization it took over three years to accomplish with many employees lost—both voluntary and involuntary. It’s hard work and not for the faint of heart. Most importantly, cultural change isn’t something that happens by introducing new technology or process; containers and microservices aren’t the solution to all your problems. In fact, introducing those concepts with a team that isn’t ready will just amplify your problems.
Change always starts with people. When I was at that large organization, we used deployment velocity as our guiding metric. The chart below illustrates our journey.The story of how we got there was riddled with missteps and learning opportunities, but in the end we surpassed our highest expectations by following the People → Process → Technology framework.
Does your organization really want to change? It’s likely that most people in your organization don’t notice any major issues. If they did, they probably would’ve done something about it.
In thatbig change experience, I observed engineering teams that had no automated testing, no code reviews, suffered from outages, infrequent deployments and, worst of all, an ops vs devs mentality. There was complete absence of trust masked by processes that would make the DMV look innovative. In order to deploy anything into production, it took 2-3 approvals from Directors and VPs who were distant from the code.
This culture resulted in missed deadlines, creating new services in AWS took months due to unnecessary approvals. We saw a revolving door of engineers, losing 1-2 engineers every month on a team of ~20. There was lack of ownership—when services crashed, the ops team would blame engineers.And we experienced a“Not Invented Here” mentality that resulted in a myriad of home grown tools with Titanic levels of reliability.
In my attempt to address these issues, I made the common mistake of trying to solve for them before I got consensus. And I got what I deserved—I was attacked by upper management. From their perspective, all of this was a waste of their time since nothing was wrong. Also, by extension, asking to change any of these processes and tools, I was attacking their leadership skills.
Giving Your Team A Voice
The better approach was to fix what the team thought was broken. While upper management was completely ignorant to these issues, engineers in the trenches were experiencing the pain of missed deadlines, pressure from product and disappointed customers. For upper management, ignorance was bliss.
So we asked the team what they thought. By surveying them, we were able to quantify the problems. It was no longer about me vs upper management.It was the team identifying their biggest problem andasking for change—the Ops vs Devs mentality. We were one team, and we should act like it.
So what are some of the questions you should ask in your survey? Each organization’s surveys should be tailored to their needs.But we recommend open ended questions, since multiple choice questions typically result in“leading the witness”. Some questions you might want to consider include:
What are the biggest bottlenecks in getting the most time?
What is stopping you from collaborating with your teammates?
Are you clear on your weekly, monthly and yearly objectives?
If you were the VP of Engineering what would change?
What deployment tools are you using to deploy?
After summarizing your survey feedback, you’ll have a rich set of data points that nobody can argue with because they would just be arguing with the team.
Picking Your One Metric
Now comes the hard part: selecting a single metric that represents what you’re trying to change. The actual metric actually doesn’t matter, what matters more is that it’s a metric that your team can rally your team around.
For us, the“Ops vs Devs” were causing significant bottlenecks, so we chose number of deploymentsas our guiding metric—or as we called it“deployment velocity”. When we started measuring this, it was five deployments per month. After a year of focusing on that single metric we increased it to an astounding 2,400 deployments per month.
If culture is a set of shared values by your organization, then your processes are a manifestation of those values. Once we understood that the“Ops vs Devs” culture was the focus, we then wanted to understand why it was there in the first place.
As it turns out, years earlier there were even more frequent outages, so management decided to put up gates or a series of manual checklists before going to production. This resulted indevelopers having their access to production systems revoked because they were not trustworthy.
Trusting Your Developers
I don’t understand why engineering leaders insist on saying“I trust my developers to make the right decisions,” while at the same time creating processes that prevent them from doing so and turning talented engineers unhappy. To that end, we began by reinstating production access to all developers.But this also came with a new responsibility: on-call rotation. No longer were the operations team responsible for developer mishaps. Everyone was held responsible for their own code. We immediately saw an uptick in both deployments and new services created by making that change in process.
Buy First, Then Build
We also made the decision to buy all the technology we could so development teams could focus on building differentiated software for the company. If anyone wanted to build in-house, the decision and process had to be dramatically cheaper than buying. This process change not only had an impact on our ability to add value to the company but it actually made developers much happier in their jobs.
A True DevOps Culture
What formed was a new team which was truly DevOps. The team was created by developers who truly were interested in building software to help operationalize our infrastructure and software delivery.New processes were created to get the DevOps team involved whenever necessary, but they alone were not responsible for SLA’s or uptime of developer applications. A partnership was born.
Too many engineers like to start a process like this by finding a set of new and interesting technologies, and then forcing them onto their organization. And why not? As engineers we love to tinker. In fact, I made this very mistake at the beginning. I started with technical solutions to problems only I could see, and that got me nowhere. But when I started with people and process first, the technology part became easier.
Rewriting Our Entire Infrastructure
Though things were getting better, we weren’t out of the woods yet. We lost a core operations engineer who didn’t share the process to deploy the core website! We literally couldn’t deploy our own website. This was a catastrophic failure on the engineering team, but it was also a blessing in disguise. We were able to ask the business for 6 weeks to fix the infrastructure so we would never be in this position again. We stopped all new product development. It was a major hit to the organization, but we knew it had to get done. We ultimately moved everything to Kubernetes and had massive productivity gains and cost reduction.
Replace Homegrown with Off-the-Shelf
In this period of time we also moved everything we could into managed services by AWS or other vendors. The prediction was the bill would be incrementally larger,but in the end we actually saved money on infrastructure. This move allowed the team to focus on building interesting and valuable software instead of supporting databases like Cassandra and Kafka.
Continuous Deployment and Feature Flagging
We also decided to rewrite our software delivery pipeline and heavily depend on Jenkins 2.0—mostly due to a suitable solution not being available like Spinnaker. We still got to throw away much of our old homegrown tooling,since the original developer was no longer there to support it. And while this helped us gain velocity, ultimately we needed to have safer deployments—when our SLAs started decreasing,we were exposing our customers to too much risk. To address that issue, we built our homegrown feature flagging solution. This too, was because off the shelf tools like LaunchDarkly didn’t exist at the time(recall our preferred approach to build vs buy). The combination of the tooling and process allowed us to reach deployment velocity that surprised even ourselves.
The chart below speaks for itself. Each new color is a new micro-service. You’ll notice we started with a few large monoliths and then quickly moved to more and more microservices because the friction to create them was so small. In that time, deployment velocity went way up! Most importantly the business benefited too. We built a whole new revenue stream due to our newfound velocity.
Approaching these changes from the people first made the rest of this transition easier and enjoyable—our team was motivated throughout the entire journey. In the end, the people, process and tools were all aligned to achieve a single goal: Deployment Velocity.
Isaac Mosquera is the CTO and Co-Founder of Armory.io. He likes building happy & productive teams.
I’m Jonathan, and I’ve just joined LaunchDarkly as VP of Engineering.
I spent the last twelve years at Atlassian, where I had the privilege of working with John Kodumal and many of the folks here at LaunchDarkly. I joined Atlassian in 2005 when it was just about the size that LaunchDarkly is now. I was there as Atlassian grew from 25 people to over 2,500.
In 2012 I worked with John and many of the current LaunchDarkly team to build the Atlassian Marketplace, which has helped developers all over the world build businesses on top of Atlassian products. It has paid out over $200M to those developers over the years and brought great success to a lot of people who put their faith in Atlassian quite early on.
In 2014, I joined the HipChat team, and over the next two years we grew the product and the team massively. I then began working on Stride, Atlassian’s brand new team communication tool. Team communication is a massive strategic opportunity as well as an incredibly complex technical and operational challenge.
We used LaunchDarkly in several of my products at Atlassian, but with Stride we built LaunchDarkly into our product and process from the beginning. Everything we built we wrapped in a feature flag, and we established a regular cadence for how new functionality would reach our customers. This allowed us to keep the pipeline moving smoothly, roll out new functionality gradually, and adjust based on the feedback we got, both from our metrics and from our users. It gave the whole team, engineering and product, enormous confidence in what we were building.
A Product I Believe In
At Atlassian, I learned firsthand just how much better feature management is for your team, in ways both tangible and intangible. It improves your team operationally and culturally. It makes both devs and product people happier. And after seeing the benefits, I would never build a service again that didn’t use feature management.
A Rare Opportunity
What’s more, I’m firmly convinced that every dev team worth their salt will eventually use feature management as part of their delivery process. And I know that LaunchDarkly does a far better job delivering that capability. There’s no reason any team should spend time building a homegrown feature management system.
LaunchDarkly will be one of the foundational tools in every developer’s toolkit, and support a huge number of software projects. Software is transformative—we can help millions of developers deliver software better, with less stress and greater results. When I think about the impact that can have on the world, it’s massive.
An Amazing Team
LaunchDarkly has already built an incredible team. Just look at what this team has accomplished so far! Some of the best people I’ve worked with over the years have all ended up here. As I have begun to meet the rest of the team, I see the same values and qualities.
I am confident that this is a team I want to be part of—a team that values transparency, humility, growth, collaboration, and openness. One that leads with a great product. And one that has a mission to help our customers grow.
I’m so excited to be here, and so grateful that I’ve been invited me to join this small, but growing rocketry company. We have a huge opportunity in front us. I’m excited to create a product people love and depend on, to help LaunchDarkly grow, and to make a huge impact on the world of software development.
Last week I attended the QCon London conference from Monday to Wednesday. It was a thoroughly interesting and informative three days. The sessions I heard ranged from microservice architectures to Chaos Engineering, and from how to retain the best people to new features of the Windows console. But there was one talk that really stood out to me—it took a hard look at whether we are Software Developers or Software Engineers.
“QCon London is a conference for senior software engineers and architects on the patterns, practices, and use cases leveraged by the world’s most innovative software shops.”
QCon London describes itself as a software conference, not necessarily a developer conference. It focuses more on the practices of creating software as opposed to showing off the latest and greatest frameworks and languages, and how to work with them. This came through in the talks I attended where most showed very little code and focused on how we work as teams, how we can interact with tools, and the idea that how we treat our code can have a huge impact on what we ultimately deliver.
With that in mind I went to a talk titled “Taking Back ‘Software Engineering’” by Dave Farley. I wanted to understand the differences he sees between being a software developer and an engineer, and learn how those differences can help create better code. During his 50 minute presentation, Farley outlined three main phases of production we have gone through. The first was craft, where one person builds one thing. The next was mass production, which produced a lot of things but wastefully resulted in stockpiles of products that weren’t necessarily used. The final type of production was lean mass production and Just In Time (JIT) production. This is the most common form of production today and is possible because of tried and tested methodologies ensuring high quality and efficient production lines. JIT production requires a good application of the Scientific Method to be applied to enable incremental and iterative improvements that result in a high-quality product at the end.
Without this JIT production approach and the Scientific Method, Farley pointed out that NASA would never have taken humans to the moon and back. It is only through robust and repeatable experiments that NASA could understand the physics and engineering required to build a rocket that could enter Earth’s orbit, then the moon’s, land humans on the moon, and then bring them back to Earth. It’s worth noting that NASA achieved this feat within 10 years of when President John F. Kennedy declared the US would do it—at which point NASA had not yet even launched an orbital rocket successfully.
Farley surmised that engineering and the application of the Scientific Method has led to some of humanity’s greatest achievements, and yet when it comes to software there is a tendency to shy away from the title of “Engineering”. For many the title brings with it ideas of strict regulations and standards that hinder or slow creativity and progress rather than enable them. To demonstrate his point Farley asked the audience, “How many of you studied Computer Science at university?” Most of the room raised their hand. He followed up with, “How many of you were taught the Scientific Method when studying Computer Science?” There were now very few hands up.
Without a scientific approach to software development it’s perhaps an industry that follows a craft-like approach to production where because it works, it’s good enough. For example, if a software specification was given to several organisations to build, one could expect widely different levels of quality with the product. However, the same cannot be said of giving a specification for a building, car or rocket to be built—they could appear different but would be quality products based on rigorous tests and standards.
Farley went on to talk about how Test-Driven Development and Continuous Delivery are great at moving the software industry to be more scientific and rigorous in its testing standards. Though they are helping the industry to be better at engineering, there is perhaps another step needed—Hypothesis-Driven Development (HDD)—to truly move the industry to being one of engineers instead of developers.
Through HDD, theories would be created with expected outcomes before the software development aspect was even considered. This allows some robust testing to do be done further down the line, if the hypothesis stands up to the testing then it can be concluded that this appears to be correct. Further testing of the same hypothesis could be done too, allowing for repeatable tests that demonstrate the theory to be correct. The theories could be approached on a highly iterative basis, following a MVP like approach, if at any point the theory no longer holds up then the work on that feature could be stopped.
The theories wouldn’t need to come from developers and engineers themselves, although they could, but could come from other aspects of the business and stakeholders who request work to be done on the products being built. This would result in more accountability for what is being requested with a clear expectation around the success criteria.
Whilst I and my colleagues apply some of these aspects to the way we work, we don’t do everything and don’t approach working with software with such a scientific view. I can see clear benefits to using the Scientific Method when working with software. When I think about how we might better adopt this way of working I am drawn to LaunchDarkly.
We use LaunchDarkly at work for feature rollouts, changing user journeys and for A/B testing. The ease and speed of use make it a great tool for running experiments, both big and small. When I think about how we could be highly iterative with running experiments to test a hypothesis, LaunchDarkly would be an excellent way to control that test. A feature flag could be set up for a very small test with strict targeting rules, and if the results match or exceed the hypothesis then the targeting could be expanded. However, if the results are not matching what was expected, then the flag could be turned off. This approach allows for small changes to be made, with minimal amount of time and effort being spent, but for useful for results to be collected before any major investment was made into developing a feature.
I found Farley’s talk at QCon London to be an interesting and thought provoking look at how I could change the way I work with software. I’m now thinking about ways to be more scientific in how I approach my work, and I think LaunchDarkly will be a very useful tool when working with a Hypothesis Driven Development approach to software engineering.
In January at our Test in Production MeetUp, we invited Maria Verba, QA Automation Engineer at Handshake, to talk about why and how her team tests in production. (If you’re interested in joining us, sign up here.)
At Handshake, Maria is focused on making sure her organization ships a product that customers love to use, as well as supporting the engineers with anything test related. In her presentation, she discusses the requirements her team has put in place to test safely. She also covers some specific examples of when testing in production is better than in pre-production.
“Whenever I hear test and production, I definitely don’t cringe anymore. But I know that there are right ways to do this, and wrongs ways to do it. Right ways will always include a healthy ecosystem. A healthy ecosystem implies you have good thorough tests. You have feature toggles. You have monitoring in place. We have to consider our trade-offs. Sometimes it does make more sense to create the mock production environment, sometimes it doesn’t. There are definitely things that cannot be tested in pre-production environments.” -Maria
Watch her talk below.
Hello everybody! Again, good evening. Thank you for having me. My name is Maria Verba. I work at a company called Handshake here in San Francisco.
Today, I’m going to talk to you about how we do, and why we do testing and production. My own personal journey from thinking about it as necessary evil, things that you have to do but you don’t want to do, to thinking about it more of indispensable, quite valuable tool.
To start off, let’s talk a little bit about Handshake. Most of you have probably never heard of it. It’s a small company, it’s understandable. Handshake is a career platform for students that helps connect students with employers, and helps students find meaningful jobs, and jump-start their careers. At Handshake, we believe that talent is distributed equally, but opportunity is not, so we work very hard to bridge that gap, and to help students find their dream jobs, and make a career out of it.
My role at Handshake is quality assurance engineer. Besides making sure that we ship very good products that our customers love to use, I think that my responsibility as a QA engineer is also supporting engineers, engineering team with everything and anything test-related, test infrastructure, test framework, as well as keeping their productivity and happiness high.
With my background with being in QA for about six years, whenever I heard “Oh, let’s just test this in production,” this is pretty much the reaction I would get. No! No way! Let’s not do that. Why don’t we want to do this? First of all, I do not like customers to find any issues. I do not want to expose them to any potential even small bugs. Yeah, let’s not test in production.
The second thing is I personally create a lot of gibberish records. I don’t care when I test. Sometimes it makes sense, sometimes it doesn’t make sense. I wouldn’t want anybody to see those records. Besides, it has a potential to mess up reporting and analytics. It’s also very, very easy to delete records, and modify records, so making this mistake is very costly. It’s very expensive, and probably engineers will spend like a couple of hours trying to dig out a recent database backup, and try to restore it. So yeah, let’s not do that in production. The last one is data security. We are actually exposing some records to other people that shouldn’t see that.
Then I started realizing that maybe it’s not actually that bad. Sometimes we have to make the sacrifice, and sometimes we have to test in production. I thought, “Yeah, maybe it’s okay as long as we have automation testing.” Before deploying, we have something that verify that our stuff works. We also want to make sure that we have a very good system, early detection system and monitoring and alerting system, which we’re going to find mistakes that we make before our customers do. We absolutely have to put it behind the feature toggle. There is absolutely no way around it. Even if there is no issues with our new feature, we do not want to surprise our customers. We do not want to throw in a new feature in them, deal with it, whatever you want to do. We want to have that material ready for them. We want to expose them to it before they see it.
In the recent times, the way that we’ve done it, it historically happened that most of our features were read-only. We would not create any new records. We would not modify any records. We also made some trade-offs. We thought about it. We realized that yes, it is possible to create those separate production-like environments and test it there, but for us, it’s not realistic to do that. We move very fast. We don’t have that many engineers to spend time creating new environments. Basically, it’s a very expensive process for us. We really don’t want to do it. Besides that, we are very careful about our data records, and we take security very seriously. We do constantly data privacy training. We have multiple checks in place and code. This helps us safeguard against any issues.
In the past where we did test in production, this would be a couple of examples. When we needed real-life traffic – general performance testing, performance monitoring on a daily basis. We also needed our production data. Sometimes it’s very hard to reproduce and replicate the same data that we have in production. Examples would be Elasticsearch5 upgrade and job recommendation service. Lastly, we do a lot of A/B testing and experimentation at Handshake. We validate our product ideas.
I’m going to talk a little bit more in detail about a couple of these examples here. Elasticsearch5 upgrade was a pretty good example, because first of all, why we needed to this? We rely heavily on search engine. Search-based features are pretty much our core. Whenever students log in, most of the time, what they want to do is find a job. Our search service cannot be down. However, we were using Elasticsearch2, which became deprecated. That put us in a position where we had to switch to Elasticsearch5 very quickly.
Let’s talk about why we can’t really test this in production. The update is too large. Basically, we cannot afford making any mistakes here. It touches everything. It has a potential to affect every single customer we have, every single user we have. It’s very extremely important for us to keep the features functionally working.
How did we do this? First of all, it took us a while to figure out where did the Elasticsearch version 3 and 4 go. Then we created new classes. The classes for the new search syntax. We started the upgrade with changing the syntax for our… We put those classes corresponding to each index behind the feature toggle, and we thoroughly tested with unit and integration framework. We’ve rolled it out to production incrementally, index by index, and we tested that in production, because we have that amount of data in production that we needed. We were able to verify that we get correct results after the update.
The second case study is slightly different. Job recommendation service is … at Handshake, we use machine learning. We apply machine learning to recommend jobs to students based on students’ behavior, based on students’ major skill, maybe a previous job history, job viewing history, location interest, etc. We take hold of that melting pot information, and we suggest a job based on that to a student. The question that we were trying to ask here is “Does this job make sense to this particular student?” A lot of times, you cannot go through all of those possible combinations and computations to answer that question.
To recreate that environment, it would take us, again, a lot of time and effort. We, again, rolled out incrementally. We heavily used our interns, because they’re still in school. After verifying that the jobs make sense, we started rolling it out to other schools as well.
talking about these two examples, I can hear the question in the audience boiling. You could still do this. You could still create a mock environment, and do the same kind of testing not in production. You could have done it pre-production. And yes, that’s true. I agree with that. We had to make those changes based on what our business needs are, based on our speed. We could not afford spending time and effort in setting up, maintaining, and securing those environments. But yeah, it is possible.
However, what if it changed the algorithm behind the job recommendation service? How do we make sure that those jobs are still relevant to the students? Do we ask the students after we change everything? I don’t know.
Here is where our experimentation framework came in. We use experiments and A/B testing a lot at Handshake. We pretty much test every major product hypothesis that we have. For us, it’s a two-step process. We have impressions and conversions. Impressions are a call-to-actions, or but on the prompt, something that the user sees. And then conversion is action based on seeing that prompt. An example would be showing a profile completion prompt to a student on a dashboard, and that would be an impression. Conversion would be if the student fills out their profile.
The way that we do it is we, first of all, determine which users, which students we want to target. We separate them into a different pool. Sometimes we want to say we want our experiment to work only for, or we want to target only seniors. Base on that, we remove other students from the test group. Then we split it into treatment and control groups. Then we track their conversions and impressions. We track them for both control and treatment group. This allows us to get very accurate results.
We are using LaunchDarkly to do this. We’re using targeting rules. As you can see here in the, maybe it’s hard to see, but basically, we’re specifying the user type as students, thus we’re removing every other student; maybe there is administrators or something like that. And then we say we want 5% of students to be on variation A of the feature, and 95% on variation B of the feature.
Unfortunately, we can’t do all of it on LaunchDarkly. Some of our experiments or yeah, maybe like a good chunk of our experiments rely on some more dynamic information, some things that change over time. For example, a student has seen a job recently. We have to define recently. We have to define what exactly we want to track about that job. In order to solve this problem, we create records for LaunchDarkly, and this allows us to take additional conditions and configurations, and split our user groups into more detailed cohorts.
As a result, we get a very, very powerful experimentation for framework, and we can only do this in production. There is no way around it.
Going back to my personal QA perspective and key takeaways. Whenever I hear test and production, I definitely don’t cringe anymore. But I know that there are right ways to do this, and wrongs ways to do it. Right ways will always include a healthy ecosystem. A healthy ecosystem implies you have good thorough tests. You have feature toggles. You have monitoring in place. We have to consider our trade-offs. Sometimes it does make more sense to create the mock production environment, sometimes it doesn’t. There are definitely things that cannot be tested in pre-production environments.
That also is in line with my personal goals of, first of all, keeping our customers happy, and product stable. Our customers would love to use the product that is driven by the data, and not our own hypothesis. I also think that it keeps the engineers on the team productive and happy, because they don’t have to spend time creating and maintaining all of these different environments.
What’s next for us? It’s Canary testing. We want to be able to divert some portion of our traffic into new features, for example, Rails 5 versus Rails 4. Right now, we are working on moving to Kubernetes to achieve that.
Thank you so much for your time. If you have any questions, find me, I’m going to be around, and reach out. Thank you.