How to Scale End-to-End Observability in AWS Environments

Kubernetes Operators to realize the dream of Zero-Touch Ops

Kubernetes Operators has the power to realize the dream of Zero-touch Ops, bringing in AIOps to life…and this is how I believe it will.


    Operators

    As we step into MicroServices architectures, and ways to deploy these on cloud with containers, and all the goodness of DevOps …the application functionality grows..the clusters and the number of resources in the cluster also grows…if the application is not “built-for-manage”, it's going to be a nightmare to manage these applications, and we might end up spending more effort in managing these applications, than building them…ironically!!! while the world of automation technology has huge promise, and we are talking about zero-touch ops as nirvana for managing cloud applications!!!.

    According to me Operators is the most important architectural component in the k8s world, that has a huge promise to carry us towards our zero-touch (or low-touch) ops journey..

    Before I jump in…let me quickly walk u thru my understanding of operators (and I am sure there are a lot of blogs, vblogs, youtube videos, which might do a better job.. :-).)

    k8s is all about Controllers & Resources.

    Resource: Aresourceis an endpoint in the Kubernetes API that stores a collection of API objects of a certain kind; for example, the built-inpodsresource contains a collection of Pod objects.
    Controllers: In Kubernetes, controllers are control loops that watch the state of your cluster, then make or request changes where needed.
    1_rVlbUlxIPAzfOMbudajeNA.png

    Controllers have the logic of managing the resources, and that's how the K8s cluster runs.

    In the initial versions of the k8s, it came with defined resources, and we were only restricted to use those resources that came along with the k8s.

    Controllers are very good in managing stateless applications, as its like a constant control loop to track and fix. since applications are stateless, there is no backup/recovery/restore of state. for-example if a instance of webserver crashes, controller can easily replace that with another instance of webserver and bring it back to desired state.
    But for stateful applications like databases, it’s not that straight forward, and it will require manual intervention to restore the state!!! so we need something more than standard controllers.

    Since the introduction of the Custom Resources, we have the flexibility to declare and create our own k8s resources.

    Now imagine if we can start defining our own resources and letting the k8s also manage them!!!! and even better imagine, if we can build our own controllers to have our own custom manage logic, and letting k8s run our resources!!!…and that is what is “Operators”!!!

    With Operators, we should be able to write the logic for complete management of custom resources, and let k8s manage our resources!!!..and that's how we can move to low-touch ops!!!

    so what all can we automate with operators…the answer is “everything that can be automated”…right from installation, patching, updates, upgrades, backup, recovery, capturing telemetry, and acting based on AI (artificial intelligence to the nirvana stage of zero-touch ops.

    There is a very well defined Operators maturity model, that clearly defines the 5 phases of maturity.

    1_ZLRvdqerOAloSVbFWWyEfw.png

    There is a very well defined Operators maturity model, that clearly defines the 5 phases of maturity.

    1_75UDw8T8l54FsAsezazQ8A.png

    There are 3 main components of Operators Framework

    Operators SDK: provides the tools to build, test, and package the Operators. Provides 3 SDK out of the box

    • Helm SDK: provides a declarative way of building Operators, with this mainly install and configure kind of Operators can be built
    • Ansible SDK, Go SDK: Ansible and GO SDKs provide more advanced ways of building the Operators. where you can build Operators all the way to “Auto-Pilot” maturity.

    Apart from Operators SDK — there are some tools in the market such as KUDO, kubebuilder, Metacontroller

    Operator Lifecycle Manager (OLM): manages the complete lifecycle of the Operator — installing and managing the Operator. OLM monitors the CRD that is deployed and when something changes..then it ensures that the changes are applied across the cluster

    Operator Metering: reports the usage of the operator to help the metering

    1_Q0_PgdZLpRFFImPQCjxINQ.png

    Creating & Deploying an Operator

    Here is a quick walk-thru of building and deploying an Operator. Just for the completeness, I thought I will do a very quick walk-thru

    1. Install Operator SDK
    2. Build, Test and Deploy
    3. Evolve & Mature

    AIOps for Zero-Touch Ops

    Artificial Intelligence & applying machine learning for ITOps has become a reality and has already become a very common practice to bring down the operational cost. So what capabilities are required for AIOps???

    1_VJrj3HE4H4_QqK_mvliS-w.png

    The picture above illustrates my understanding of AIOps capability architecture. (thanks Naveen E P for brainstorming and contribution in building this nice picture).

    AIOps goes beyond standard event detection to advanced prediction with actionable insights. The term “actionable” is important — it’s the recommendation or execution of the best action to fix the current or issues that might occur based on prediction. This is what we really need for an “Auto-Pilot” Maturity, where it will replace or augment Site Reliability Engineers (SRE).

    Now if you connect this generic picture of AIOps with what k8s Operators bring to the table, it is very clear that the operators have all that we need to be our AIOps engine.

    All the various types of capabilities can be built as a CRs, and can be a bunch of operators that will bring all the pieces of AIOps to life, these operators co-locate inside the K8s cluster and run as PODs/Sidecars. They can also integrate with ServiceMesh for additional metrics and telemetry, and act proactively and operate the cluster.
    1_QHEpWXqvYNRMEJ0JUREeRQ.png

    The above picture provides a high-level view of the idea, and let's see how it maps to the 3 layers that we talked on the AIOps capability architecture

    • Visibility: Visibility layer can be built on Grafana, providing single pane visibility of the cluster health
    • Prediction: Prediction layer has all the modules (python modules to advanced spark clusters as specific operators), that build machine learning models from the data that is streaming from Prometheus, ServiceMesh/istio.
    • Resolution: Resolution can be simple k8s commands to Ansible playbooks or even invoking RPA digital works — depending on standard operating procedures, to recover the failures or take proactive measures

    The best part is all of this AIOps is happening native to Kubernetes (except maybe RPAs)

    There you go, Operators is the key to unlock the “Zero-Touch Ops” Journey.

    In the meantime, I have been playing around with operators and will soon come back with a hands-on session…

    Have fun, take care..ttyl

    References


    Get similar stories in your inbox weekly, for free



    Share this story:
    abvijaykumar
    Vijay Kumar A B, IBM Distinguished Engineer @ IBM

    AB Vijay is a IBM Distinguished Engineer & CTO for CAS Manage & Application Innovation Lab. He is a IBM Master Inventor, who has more than 58 patents filed in his name. He has more than 22 years experience in IBM. He is a recognized as subject matter expert for his contribution to advanced mobility in automotive, and has led several implementation involving complex industry solutions. He specializes in mobile, cloud, containers, automotive, sensor-based machine-to-machine, Internet of Things, an

    How to Scale End-to-End Observability in AWS Environments

    Latest stories


    How ManageEngine Applications Manager Can Help Overcome Challenges In Kubernetes Monitoring

    We tested ManageEngine Applications Manager to monitor different Kubernetes clusters. This post shares our review …

    AIOps with Site24x7: Maximizing Efficiency at an Affordable Cost

    In this post we'll dive deep into integrating AIOps in your business suing Site24x7 to …

    A Review of Zoho ManageEngine

    Zoho Corp., formerly known as AdventNet Inc., has established itself as a major player in …

    Should I learn Java in 2023? A Practical Guide

    Java is one of the most widely used programming languages in the world. It has …

    The fastest way to ramp up on DevOps

    You probably have been thinking of moving to DevOps or learning DevOps as a beginner. …

    Why You Need a Blockchain Node Provider

    In this article, we briefly cover the concept of blockchain nodes provider and explain why …

    Top 5 Virtual desktop Provides in 2022

    Here are the top 5 virtual desktop providers who offer a range of benefits such …

    Why Your Business Should Connect Directly To Your Cloud

    Today, companies make the most use of cloud technology regardless of their size and sector. …

    7 Must-Watch DevSecOps Videos

    Security is a crucial part of application development and DevSecOps makes it easy and continuous.The …