4 Signs Software Reliability Should be Your Top Priority

You know the companies who break away from the pack. You buy their products with prime shipping, you ride in their cars. You’ve seen them disrupt entire industries. It might seem like giants such as Amazon and Uber have always existed as towering pillars of profit, but that’s not so. What sets companies like these apart is a crucial piece of knowledge. They spotted the tipping point when reliability becomes a top priority to a software company’s success.

‍Pinpointing this tipping point is hard. After all, many companies can’t afford to stop shipping new features to shore up their software. Timing the transition to reliability well can launch a company ahead of the competition, and win the market (e.g. Amazon, Home Depot). But missing it can spell a company or even an industry’s doom (e.g. Barnes & Noble, Forever 21, and Gymboree in the retail apocalypse). Luckily, there are signs as you approach the tipping point. From examining over 300 companies, we’ve identified five.

Let’s break these signs down together.

1. Your Product is Becoming a Utility

When a product is a novelty, new adopters have a generous tolerance for errors because they are buying a vision of the future. Once the product becomes a utility, though, companies start to depend on it for critical functions. In the case of Twilio, suicide prevention hotlines are dependent on their reliability. And people start to depend on the product for daily life.

Consider Amazon. This company set the new consumer expectation for e-commerce and disrupted the sales of companies like Barnes & Noble and many more. How? First of all, the store is always open and accessible from your living room couch. Second, Prime delivers all packages with 2-day shipping. Do Amazon’s users care more about drone delivery (a feature), or 2-day shipping (reliability)?

While drone delivery is a neat novelty, the commodity of fast shipping is Amazon’s bread and butter. Users want their order, on-time, no matter what it takes to get there. How else can parents count on Santa coming on Christmas Eve?

While it’s tough to pinpoint the exact moment an industry converts from novelty to utility, we can look at three early indicators according to Harvard Business School.

Companies begin to compete for pricing. Instead of shrugging off a price difference in exchange for the pleasure of a novelty, customers are doing their research before buying.
Companies are restructuring their finances in order to keep the same profit margin even though sales are increasing. They need to innovate to make money with higher costs. They have more employees, more maintenance expenses, more everything.
Companies take a closer look at their customer base. What’s the target market? What customers don’t they want buying their product? They’ve got to make tough choices here to keep loyal customers who appreciate what the company brings to the table.

Thanks to companies like Amazon, Google, Facebook, Netflix, etc., software delivery is transitioning from a novelty to a utility, from something we like to something we need every day. People expect every service to be as responsive and available as these tech giants. As your service loses its novelty, your users will look for reliability over features.

Thanks to companies like Amazon, Google, Facebook, Netflix, etc., software delivery is transitioning from a novelty to a utility.

2. Your Users are Demanding Reliability over New Features

You may think you can spot the tipping point by looking at other companies who have found theirs, but each company functions differently. A company needs to examine itself to find its unique path forward. One way to do this is by looking at customer requests. When customer requests become geared more towards reliability than new features, then the tipping point is approaching.

Let’s look at Uber. When Uber rolled onto the scene, people began ditching taxis left and right. This shift is due to Uber’s higher level of reliability. Customers can always count on a correct ETA and the ability to access the application at all times, even in the middle of the night when finding a taxi can be a real nightmare.

On Uber’s feature request page, we can see that this reliability is the key to keeping customers happy. While some of the features are still geared towards novelty (such as the proposed “hungover” option which allows users to let the driver know to speak softly and drive carefully), most of them are reliability-based. At the time of writing, the top 5 are all related to rider safety. Another highly requested feature is the ability for the application to recognize gated entries, so users’ ETAs aren’t delayed by a lack of entry code. This need for users to be able to get to their destination on time every time has manifested in feature requests leaning towards speed, efficiency, and consistency. Users don’t want to worry that a gated community will cause a detour.

When feature requests for reliability exceeds 50% of all feature requests, it’s time to focus on reliability first and foremost.

Companies don’t need to keep adding features when the customer base is satisfied with the product’s core functionality. What companies do need is to reliably deliver those core functionalities, which will depend on infrastructure investments. Customers will wait for a new feature. But, they won’t wait for a site to come back up after extended periods of downtime caused by failed rollouts.

Slack was down for 2 hours last quarter and it unfortunately cost them $8 million in sales. This cost is too high. Beyond the bottom line, this hurts brand reputation, brand loyalty, and customers' confidence in the company. So once a company has experimented and found the core features that stick, it’s time to prioritize reliability of those core features to keep the loyal users happy.

3. New Contracts have Tighter SLAs (B2B) / Customers are Getting Less Patient (B2C)

Speaking of keeping users happy, when nearing the tipping point, companies begin to notice their SLAs change (at least for B2B companies). This means they need to tighten up those SLOs and decrease their error budgets to meet the new SLAs. Perhaps this realization will occur when users take toTwitter after an outage, when the customer service department goes bonkers, or when accounting needs to pay for a breach of SLA. You know the tipping point is right in front of your eyes when you worry about the financial penalties of a bad rollout.

Customers will also become more vocal about complaints. Companies won’t be able to afford losing too many customers due to reliability issues. Users want software that works the way they expect it to whenever they need it. If customers are disappointed, they won’t hang around forever. They’ll begin to migrate towards whatever company can promise the highest level of reliability. Once a customer is lost, it’s very difficult to win them back. If you begin to notice customer churn, you’ve already missed the tipping point and are now forced to play a game of catch-up with your competitors. Many retail companies are in this position now.

If you begin to notice customer churn, you’ve already missed the tipping point and are now forced to play a game of catch-up with your competitors.

4. Spaghetti Code is Now Easier to Refactor than to Fix

When a company is in its infancy, speed is key. And when developers realize there are errors in their code, sometimes going back to fix it properly isn’t an option. They need to move on. So they build on, or out, or all over the place. The code becomes hard to follow, and even harder to fix in the future when the company has enough features to satisfy their customer base. This is known as spaghetti code.

At this time, it might be easier to rebuild the infrastructure than fix the code. This is a key sign that, internally, your company has reached the tipping point. Customers expect a standard of reliability that a company must meet. Developers can’t afford to make risky fixes which might take the system down for a while, and they don’t have the time to rebuild totally.

If you miss this internal tipping point, then you will run into a situation where no new features can be shipped for not weeks, but months. Multi-month code freezes to fix reliability issues are common, expensive and preventable. So watching out for this sign and timing the refactoring well can save you millions of dollars of engineering spend.

Multi-month code freezes to fix reliability issues are common, expensive and preventable.

In a software company’s lifetime, there comes a moment when it’s mission critical to shift priority from shipping new features to protecting the reliability of all features shipped so far. Spotting the tipping point and adapting can save you from million-dollar losses, endless newspaper headlines about your system failure, and ensure that customers continue to trust you.

Companies can engage in preventative measures to make this transition towards reliability more successful such as: adopting SRE, investing in incident resolution, doing diligent postmortems, and tracking SLOs and error budgets, etc. Much like caring for a human body, this sort of proactive effort can lead to less reactive efforts and overall better health. After all, the goal isn’t to just reach the tipping point; it’s to thrive long after.‍

Conceptualized by Lyon Wong

‍Written by Hannah Culver

‍Edited by Charlie Taylor

Get similar stories in your inbox weekly, for free

Share this story:

Blameless

Blameless is the industry's first end-to-end SRE platform, empowering teams to optimize the reliability of their systems without sacrificing innovation velocity.