5 Surefire Ways to Improve Your Product Reliability with Logging and Automation

Written by our friends at Rookout!

In the fast-moving world of software development, as your product and organization grow and evolve, there are almost always competing priorities. Zeroing in on what is most important to your business in order to take it to the next level can at times seem like a non-stop process of trial and error. Oftentimes the customer who screams the loudest becomes a priority and gets the most focus. Yet, regardless of which direction you’re heading at the current moment, it’s important to always keep the reliability of your product in mind. This is why it is critical to always be making progress in automating tasks that add to the reliability of your core business - your product.

Over many years of working with customers, we have come to the conclusion that there are several specific areas of focus where investment in automation can add tremendous value over the long run.

Remove manual work from your software delivery process

The area that is most obvious is, of course, the automation of your software delivery process. If you walk into any enterprise IT organization today, there is undoubtedly some form of DevOps teams who are responsible for automating and building processes around end-to-end software delivery. Having manual steps or manual configuration within your software delivery process, from code check-in to deployment into test or production environments, leaves too much room for error. As such, automation in the software delivery process is at the very core of software reliability.

You should absolutely have a blueprint for how you build, test, and deliver your software to your customers. This includes things like code scans, automated builds, automated test execution (unit, integration, security, performance), configuration management for automating how you spin-up your infrastructure (infrastructure as code), and release automation for automating your deployments into those environments. Not only should you automate this process through Continuous Integration and Continuous Delivery practices, but you should also build in extensive monitoring and observability as part of this pipeline. In fact, these days, observability pipelines are becoming a well-known construct.

Observability and reliability go hand in hand

If you want your software to be reliable and as simple as possible to fix when you encounter issues, it has to be observable. This is something we hear quite often nowadays, but what exactly do we mean when we talk of observability? Wikipedia defines observability as the measure of how well internal states of a system can be inferred from knowledge of its external outputs. Can you understand the current state of your application at any point in time, including things like the health of your services, request/response latency, application errors, metrics, and tracing of messages throughout your system? As Charity Majors aptly tweeted:

“Observability, short and sweet:

- can you understand whatever internal state the system has gotten itself into?

...just by inspecting and interrogating its output?

...even if (especially if) you have never seen it happen before?”

APM tools on the market today can help with many of these issues and make life simpler. In addition, as Liran Haimovitch, CTO of Rookout, discusses in this article, business-level metrics need to be observable as well and will continue to play a more important role in measuring the success and reliability of products.

It is evident that the industry as a whole is putting more emphasis on observability and monitoring with the creation of open standards like the OpenTelemetry Project and OpenTracing, which aim to provide vendor-neutral APIs and services for robust observability processes. Many organizations are also starting to adopt the concept of Observability Pipelines. An observability pipeline is an engine that aims to add a data pipeline as a buffer between the services developers are building, and the downstream systems which process and help to display and organize that data. In this approach, developers building individual services no longer need to worry about writing extra code to connect to multiple systems of monitoring and observability. Instead, they send all of their data to the observability pipeline and the pipeline does the proper filtering and routing of that data to where it needs to go.

Log only what you need

Traditionally, the primary way of building observability into applications has been through writing log lines. Logs help give developers insight into applications when they’re running in their native environments. The problem with this approach is that developers seem to develop a sort of logging FOMO (Fear of Missing Out): they easily get into a practice of logging anything and everything because they’re not sure what data they might need when something goes wrong. After all, this is software -- things always go wrong!

Now logging definitely has its place, but it can quickly get unwieldy in terms of maintenance and overhead within your code base. Using logging coupled with observability and monitoring tools that enable tracing, metrics gathering, and even insight into the state of your application variables while your app is running, provide a much better approach to debugging. For the logs you do need to write, make sure you are writing meaningful log messages with the proper logging levels for events that need to be audited down the line. The better you can follow these practices with logging and data collection, the easier it will be to build and maintain reliable software.

Analyze and automate your incident resolution process

Software incidents and defects happen, and usually quite frequently. Every organization encounters them on a regular basis, and so how you deal with them has a huge impact on how your customers view the reliability of your software or service.

Teams need to continuously improve the speed at which defects are handled, which, as previously mentioned, has a direct impact on the reliability of your software. As Lyon Wong, Co-founder and COO of Blameless, discusses in this blog post, organizations are now focusing on reliability, as their users prefer it over new features.

“People now expect every website or service to be as responsive and available as these tech giants. As your software or web service begins to lose its novelty, what your users will look for is reliability over features.” -- Lyon Wong

For this reason, your incident resolution process becomes a critical part of both your internal teams’ day-to-day process as well as your customer experience. Having a framework or tool in place which can help to improve your efficiency around the process of incident management and resolution of defects is a must.

Software solutions can assist in bringing visibility to all stages of incident resolution and can even help in ensuring that they don’t continue to happen repeatedly for the same reasons. Continuous improvement in this area is the key to success and reliability.

Automate user onboarding and log behavioral trends

If at all possible, automate the user onboarding process to your software. Create options for self-service, so that users can access your product when they want it and choose how they want to consume it. This creates an excellent customer experience and frees up developer time to concentrate on areas of your product that need the most focus. In today’s technology-driven age, customers have high expectations around user experience due to a large number of SaaS-based options. Focusing extra time and effort on making this initial experience smooth and effortless will pay off leaps and bounds in the long run.

And finally, the best way to make sure your software is reliable is to fully understand how your users actually use the software in the environment where they run it. There are many tools available today that can give you insights into how customers are using your software, allowing you to drive innovation in the right areas and focus on a positive customer experience. For example, companies like Pendo are creating applications that allow you to gain better insight into the entire product journey. The more you understand the customer journey, the more you can focus on making key areas of usage as reliable and useful to your users as possible.

If you liked this article and want to read more, take a look at these:

Get similar stories in your inbox weekly, for free

Share this story:

Blameless

Blameless is the industry's first end-to-end SRE platform, empowering teams to optimize the reliability of their systems without sacrificing innovation velocity.