How to Scale End-to-End Observability in AWS Environments

Canary Deployments | The Benefits of an Iterative Approach

Canary Deployments

In this blog post, we’ll share what we’ve learned about canarying and flagging best practices. We’ll look at:

- Why you should consider an iterative canarying approach to releases

- Knowing when it’s safe to expand and iterate

- Understanding how users rely on your services to find the ideal groups to canary


    Originally published on Failure is Inevitable.

    At Blameless, we want to embrace all the benefits of the SRE best practices we preach. We’re proud to announce that we’ve started using a new system of feature flagging with canaried and iterative rollouts. This is a system where new releases are broken down and flagged based on the features each part of the release implements. Then, an increasing subset of users are given access to an increasing number of features. By avoiding big changes for big groups, we reduce the chances of major outages and provide a more reliable product faster.

    Of course, switching to this system comes with challenges and decisions to make. In this blog post, we’ll share what we’ve learned about canarying and flagging best practices. We’ll look at:

    • Why you should consider an iterative canarying approach to releases
    • Knowing when it’s safe to expand and iterate
    • Understanding how users rely on your services to find the ideal groups to canary

    Why do iterative canarying releases?

    Iteration and canarying is more involved than traditional big releases. You need to look at the code being deployed and flag everything that comprises each new feature. You’ll also need to tag groups of users. Finally, instead of one big release, you do several smaller releases where more groups of users receive more features each time. Flagging features and making user groups creates overhead, and each release will take a bit of additional time. However, the benefits of this system are worth it. Here are a few to consider:

    More reliable services. Perhaps the biggest benefit of this approach is improved reliability for your service. Changes in production are a common source of incidents and outages. With small, iterative deployments, you’ll know that things can only go wrong for a subset of your services. Likewise, canarying means that only a subset of your users will be affected. By preventing major outages, you’ll greatly improve the perceived reliability of your service.

    Continuous feedback. By iterating through new features, you have many more opportunities to hear feedback from users. You’ll be able to tell what should be improved before you commit to the entire feature set.

    More manageable operations. By reducing the scope and spacing out each release, you give operations teams the opportunity to ensure they’re ready to support each feature.

    This approach does come with challenges. As this approach only deals with the deployment of code, you don’t have to retroactively change your code to be modular or feature-flagged. However, you need to build new practices for code going forward. Developers need to invest time and energy building new habits for development. They need to instill a mindset of everything being modular and iterative. Switching gears like this can initially cause development to slow, but the payoff of better releases is worth it.

    Balancing canarying and iteration safely

    Our release system has essentially two dimensions: the amount of users accessing new features, the canary size; and the new features they have access to, the iteration. Our goal is to expand both of these until they encompass all users and all features without ever making leaps large enough to jeopardize the reliability of each change. How do you find this balance and cadence?

    First, you need to build a release roadmap. This is a project that product, development, and operations teams should share. It outlines which features should be included in each iteration, and which canary groups should receive them. It should also contain an aspirational timeline for each stage. However, this timeline shouldn’t be written in stone. You’ll need to adjust your rollout speed based on how each iteration is performing.

    Iterative Release Chart Iterative Release Chart

    The key is monitoring data with feature flagging. You need to be able to see how each feature is individually performing. Whether or not a given feature should be rolled out can depend more on the performance of specific other features, rather than the overall health of the system. Blameless uses monitoring tools such as Sumologic to parse the information our system outputs. It allows us to break down which features are causing issues or unreliability, and which are stable enough to be built upon.

    Once you know a given iteration is safe, you can roll it out to the designated groups. The modular setup gives an extra layer of protection, as individual features can be rolled back without impacting the entire system. Don’t depend on this going off without a hitch, though. Like any other backup system, simulating the need to roll back an iteration is necessary to understand what your options actually are.

    Building the right canary groups

    Another best practice for canarying releases is to use customized specific canary groups. Intuitively, you might just break your users down into indiscriminate chunks — maybe 10 groups of 10% each. This works fine to get many of the benefits of canarying, but you can get even more insights with tailor-made canary groups.

    To do this, you first need to understand how your users interact with your service. Blameless uses tools such as Pendo to see how much each user relies on each feature. This is supplemented by meeting with customer success teams, who can relay reports from customers on what matters most to them. Creating things like user journeys and SLIs can quantify this importance.

    Once you have profiles for your users, create groups for each iteration based on the features in that iteration. Some qualities that you’d like your ideal canary group to have include:

    • They use the feature. If you roll out an updated feature to a group of users that don’t even notice, you won’t get the feedback you need.
    • They don’t use the featuretoomuch. On the other hand, updates are more likely to have outages during these canarying phases. Avoid users that wholly rely on a feature to keep them safe from this risk.
    • They provide feedback. Some users are more inclined to discuss what they think of new updates. These communicative users are ideal canaries.

    Of course, you won’t necessarily find a perfect set of users for every feature. The important thing is considering these things when building canary groups and choosing the best candidates you have.

    What you’ll need

    To make these frequent, iterative, and specifically targeted deployments, you’ll need to have a strong deployment system first. Practices like CI/CD are necessarily for this speed and flexibility.


    Get similar stories in your inbox weekly, for free



    Share this story:
    How to Scale End-to-End Observability in AWS Environments

    Latest stories


    How ManageEngine Applications Manager Can Help Overcome Challenges In Kubernetes Monitoring

    We tested ManageEngine Applications Manager to monitor different Kubernetes clusters. This post shares our review …

    AIOps with Site24x7: Maximizing Efficiency at an Affordable Cost

    In this post we'll dive deep into integrating AIOps in your business suing Site24x7 to …

    A Review of Zoho ManageEngine

    Zoho Corp., formerly known as AdventNet Inc., has established itself as a major player in …

    Should I learn Java in 2023? A Practical Guide

    Java is one of the most widely used programming languages in the world. It has …

    The fastest way to ramp up on DevOps

    You probably have been thinking of moving to DevOps or learning DevOps as a beginner. …

    Why You Need a Blockchain Node Provider

    In this article, we briefly cover the concept of blockchain nodes provider and explain why …

    Top 5 Virtual desktop Provides in 2022

    Here are the top 5 virtual desktop providers who offer a range of benefits such …

    Why Your Business Should Connect Directly To Your Cloud

    Today, companies make the most use of cloud technology regardless of their size and sector. …

    7 Must-Watch DevSecOps Videos

    Security is a crucial part of application development and DevSecOps makes it easy and continuous.The …