How to Scale End-to-End Observability in AWS Environments

Are you Ready for this Trend Alert? SRE is Shifting Left

    After talking with over 300 companies from industries like retail, finance, healthcare and SaaS, we see an emerging trend in the way product and engineering teams operate. Site reliability engineering (SRE) as a discipline is shifting left. Companies are focusing on reliability and problem prevention earlier in the software development life cycle (SDLC). Instead of reacting to incidents after the code ships to production, SREs are participating at the product planning phase to build in reliability on day zero.

    This shift could change the scope of work for SREs and software developers.

    SRE is not the only one; security and QA are shifting left at a rapid pace. An analogy to describe this phenomenon is ‘A stitch in time saves nine’. Imagine this in the context of clothing. Here, traditional SRE acts like the repair stitch that saves further tears. SRE shifting left strengthens the weaving process, reducing the need for future repairs.

    Why is SRE Shifting Left?

    Reliability is the most important feature  

    When Amazon's website experienced partial outage during their annual Prime Day sale in 2018, frustrated customers across North America were unable to access deals on Amazon. According to an analyst report, this hour-long outage could have cost Amazon approximately $99 million in sales. Reliability is mission critical not only for E-Commerce, but also for B2B companies. Non-reliability could result in restricted access to business data. This could further impact customer satisfaction or even violate revenue impacting SLAs. As companies realize that non-reliability could negatively impact business growth, the need for SRE grows.

    This incident shows the importance of reliability and SRE shifting left. If a company’s SRE function is shifting left, it will be more likely to expect spikes in traffic. Teams will be better prepared to handle an incident like Amazon's.

    Moving to the cloud and microservices

    The adoption of microservices has resulted in complex interactions between legacy systems, next gen systems built internally, and 3rd party cloud APIs.

    The exponential scale of  issues means SREs are now solving more problems than ever.  The availability of SREs for hire, in proportion to the rising demand, is limited.  SREs with the right experience are few and expensive to come by. The only way to overcome this demand/supply challenge is to have fewer incidents. SRE shifting left is a result of this phenomenon.

    Ending Devs vs SRE

    The team dynamics between development and SRE is due for an update. Shifting left puts the developers and SRE in a collaborative relationship at the beginning of the SDLC. This incentivizes developers to optimize reliability. Developers will leverage a reliability driven engineering mindset to ship better code. This helps avoiding working on repeat issues. Moreover, SREs do not have to be the gatekeepers for new upgrades/ innovation. This improvement in team dynamics means SRE shifting left is here to stay.

    Benefits of Shifting Left

    SRE shifting left has many advantages. First, the increase in confidence for successful deployments.  Often failures in production are a result of differences in the deployment procedures. The development team may create deployment procedures different from those used by SRE in production. Sometimes production procedures are more manual and may even use different tooling.

    SRE shifting left allows both teams to create standard deployment procedures. This eliminates the need for different tooling.  The deployment process is then practiced in test environments before reaching production. With guardrails in place, teams will feel increased confidence for successful deployments.

    SRE shifting left shortens the test cycles by testing againstSLOs(Service Level Objectives) before production. It takes less time to set SLOs in the beginning rather than at the end of the life cycle. SLOs are useful as they help make data-driven priority decisions with regards to reliability v/s new feature development. SLOs also provide more visibility into SRE's reliability improvements. Generally, companies take six months to set the SLOs, whereas shifting left reduces the timeline to weeks.

    Shifting left improves the relationship between SRE and Dev teams. Here SRE and Dev work together right from the planning stage. Developers can now learn to focus on reliability and stability from the start. Additionally, SREs can now learn development skills. This helps them identify issues in the beginning and troubleshoot faster.

    Is Everyone Shifting Left?

    While shifting left has its benefits, few companies are ready to take the step. Companies that do not have an existing SRE practice, or are not yet mature in their SRE practice, have a long way to go before they shift left. These companies can start small by having the Dev team adopt SRE best practices. But, in some cases, it might mean added responsibility for the Dev team and cause resistance to shift left.

    How Can You Shift Left?

    While talking with more than 300 companies, we saw two patterns emerging in the way companies are adopting the left shift:

    • The Dev team wears the SRE hat. The team reads books or attends webinars on SRE best practices. The team also creates standard tools and procedures for development and production. The Dev team is responsible for 100% of the support, maintenance and problem-solving.
    • An existing SRE team is involved much later in the SDLC. The SRE team now shifts left to work together with the Dev team at the planning stage. They collaborate to set the SLOs and the monitoring process from the start. They work together on problem-solving and maintenance throughout the SDLC.

    The Future of SRE

    SRE is shifting left and becoming a first-class citizen, much like security and QA. Yet, this does not take away job opportunities for a site reliability engineer.  Shifting left allows SREs to become partners in the development process. If you spent 100% of your time on unplanned work, that workload will reduce to less than 50%. As more companies focus on reliability, SREs will have the opportunity to champion shifting left.

    SRE shifting left creates opportunities for both teams. SREs are now free to build resilient, reliable systems, explore new technologies, and pursue new practices. Developers can enhance their troubleshooting skills. Adopting a blameless culture is a step in that direction. It is the beginning of making technology processes as robust as possible. The future might bring a disruption in the Dev-SRE space, but for now, shifting left is inevitable.

    Conceptualized by Ashar Rizqi

    Written by Varsha Hegde


    Get similar stories in your inbox weekly, for free



    Share this story:
    blameless
    Blameless

    Blameless is the industry's first end-to-end SRE platform, empowering teams to optimize the reliability of their systems without sacrificing innovation velocity.

    How to Scale End-to-End Observability in AWS Environments

    Latest stories


    How ManageEngine Applications Manager Can Help Overcome Challenges In Kubernetes Monitoring

    We tested ManageEngine Applications Manager to monitor different Kubernetes clusters. This post shares our review …

    AIOps with Site24x7: Maximizing Efficiency at an Affordable Cost

    In this post we'll dive deep into integrating AIOps in your business suing Site24x7 to …

    A Review of Zoho ManageEngine

    Zoho Corp., formerly known as AdventNet Inc., has established itself as a major player in …

    Should I learn Java in 2023? A Practical Guide

    Java is one of the most widely used programming languages in the world. It has …

    The fastest way to ramp up on DevOps

    You probably have been thinking of moving to DevOps or learning DevOps as a beginner. …

    Why You Need a Blockchain Node Provider

    In this article, we briefly cover the concept of blockchain nodes provider and explain why …

    Top 5 Virtual desktop Provides in 2022

    Here are the top 5 virtual desktop providers who offer a range of benefits such …

    Why Your Business Should Connect Directly To Your Cloud

    Today, companies make the most use of cloud technology regardless of their size and sector. …

    7 Must-Watch DevSecOps Videos

    Security is a crucial part of application development and DevSecOps makes it easy and continuous.The …