How to Scale End-to-End Observability in AWS Environments

What is Graphite Monitoring?

in Monitoring and Observability

What is Graphite Monitoring_.jpg

Today we are going to touch up on the topic of why Graphite monitoring is essential. In today’s current climate of extreme competition, service reliability is crucial to the success of a business. Any downtime or degraded user experience is simply not an option as dissatisfied customers will jump ship in an instant.


    Introduction

    Today we are going to touch up on the topic of why Graphite monitoring is essential. In today’s current climate of extreme competition, service reliability is crucial to the success of a business. Any downtime or degraded user experience is simply not an option as dissatisfied customers will jump ship in an instant.

    Operations teams must be able to monitor their systems organically, paying particular attention to Service Level Indicators (SLIs) pertaining to the availability of the system. Like an F1 pit team, the stakes are high and precise tooling is crucial. 

    This article will focus on the monitoring tool: Graphite

    Graphite monitoring provides operations teams with visibility on varying levels of granularity concerning the behavior and mannerisms of the systems and applications. This leads to error detection, resolution, and continuous improvement.

    MetricFirespecializes in providing aHosted Graphiteservice for monitoring. With minimal configuration, you can gain in-depth insight into your systems. If you would like to learn more about it pleasebook a demowith us, orsign up for the free trialtoday.

    What is time series data

    Graphite stores numeric time-series data (metric, value, epoch timestamp) and renders graphs of this data on demand. A time-series is a sequence of observations taken sequentially in time. Time series analysis reveals trends and patterns associated with external factors and anomalies. With adequate graphing teams and enough time series data, it's even possible to intuitively forecast future events.

    As a general rule of thumb, a time series database should meet the following requirements.

    • Highly available, even amidst a high volume of concurrent reads and writes, the nature of time series data results in more frequent write operations (95-99%) as opposed to reading.
    • The ability to maintain low latency queries in the face of high throughput.
    • The capability for massive-scale data volume with cold and hot data separation. It is common for an ingestion service to operate on 5 billion metrics per minute storing up to 1.5 petabytes of time-series data.
    • Distributed architecture: Considering the requirements of data writes and storage, it is recommended for the underlying layer to have distributed architecture capability.

    Protocols and Collectors

    As Graphites design is orientated towards modularity and doing one thing very well, there is no direct data collection support. Carbon, one of the three Graphite components, listens passively for data. Solutions such as StatsD and CollectD are used to collect and parse data upstream to Graphite as different protocols. 

    Protocols

    It is worth discussing the protocols Graphite accepts; Plaintext, Pickle, and AMQP. Plaintext messages take the following structure: <metric path> <metric value> <metric timestamp>; best used for trivial scripts or test purposes as it requires no additional formatting.

    If sizeable amounts of data are involved, one should pack pickled data into a packet containing a simple header, and send the data over a TCP socket to Carbon's pickle receiver (by default, port 2004). Graphite can also accept data using AMQP (The Advanced Message Queuing Protocol).

    AMQP protocol ensures reliable data transfer using a message broker which acts as a middleman in a distributed system. Machines use the broker as a central point of contact, it then orders the messages in a queue, and the client collects it when there is capacity available. Avoiding deadtime (blocking calls) as sender and receiver are not reliant on each other to continue working—enabling asynchronous communication.

    If your team is already using a broker such as RabbitMQ to publish and consume data, it is possible to integrate Graphite by forking an incoming stream of messages into another queue.

    Collectors

    Applications use a collector client to feed device metrics upstream to a Graphite server; common collectors being StatsD or CollectD

    StatsD is an event counter/aggregation service; listening on a UDP port for incoming metrics data it periodically sends aggregated events upstream to a back-end such as Graphite. Today, StatsD refers to the original protocol written at Etsy and to the myriad of services that now implement this protocol. 

    CollectD is a statistics collection daemon that regularly polls various sources, such as your OS, CPU, RAM, and network before sending it upstream.

    These two services complement each other very well. For a more in-depth discussion and comparison about StatsD and CollectD you can access the article here.

    Carbon and Whisper

    Carbon

    Graphite's back end is a Daemon process named Carbon (carbon-cache). It listens for inbound metric submissions and stores the metrics temporarily in a memory buffer-cache before flushing to disk in Whisper's database format. It is built on top of Twisted, which is a highly scalable event-driven I/o framework for Python.

    Twisted allows for efficient asynchronous communication with many clients and can handle an extensive amount of traffic with low overhead. The carbon-relay (optional) receives metrics from clients and applies a set of rules (Regular Expressions). It determines which carbon-cache server to relay the data to, which provides a type of replication.

    Still, there is no synchronization in place; the visual representation will be corrupt in the case of a node failure. However, it is possible to configure the re-synchronization process, but the scripts provided by Graphite require significant trial and error to get right. In turn, making it ready for a production environment.

    Whisper

    Whisper is a fixed-size, file-based time-series database. Applications are able to retrieve and manipulate data from Whisper using standard REST (create, update, and fetch operations). The design of Whisper shares many attributes with an RRD (round-robin-database), providing fast, reliable storage of numeric data.

    The design handles the files on disk and downsamples* for long-term retention, storing high precision raw data for a finite amount of time, and lower precision, summarised data, for more extended time frames.

    * The process of converting high-resolution time-series data into low-resolution time series data.

    Alternatives

    However, problems have emerged for Graphite in the cloud era; despite running multiple carbon agents and running on SSD drives, performance does not improve without expert tuning. Storage is the primary deficiency and remedies such as sharding across multiple nodes introduce too much complexity with the current design.

    The community has responded with Ceres, intended to replace Whisper as the default back-end - it is a redesign of the round-robin database format. In contrast to Whisper, Ceres is not a fixed-size database and is designed to better support sparse data of arbitrary fixed-size resolutions. This allows Graphite to distribute individual time-series across multiple servers or mounts.

    It is a good alternative for users who want to maintain their current system architecture rather than disrupting operations with a migration to an alternative storage back-end. Unfortunately, development is still ongoing, and only a small percentage of the community is running Ceres in production.

    MetricFire’sHosted Graphitesolution mitigates this problem by replacing whisper storage for seamless scaling with multiple redundant copies of your data. To learn more from one of our experts,book a demowith us. Alternatively, you can try it out for yourself with a 14-dayfree trial.

    Visualisation

    Graphite-web is a Django based web app that provides a simple user interface for visualizing the stored metrics in a graph format. It is created using an intuitive URL-based API for immediate graphing. As it uses Cairo for rendering graphs, it depends on several graphics-related libraries typically absent from standard VMs. Make sure to run the dependencies script in configuration to avoid unnecessary complications during installation.  

    The Graphite composer is the best way to learn Graphite's visualization capabilities. All of the metrics are present in a hierarchical tree structure on the left-hand side; clicking on the metrics adds its data series to the composer canvas. From here, it is easy to apply transformative functions for novel on-the-fly interpretation of the data. 

    All of Graphites features are exposed via the API as the UI consumer has the same endpoints. As this design is so clean, many alternative visualization tools are compatible with Graphite. The most popular being Grafana which provides a much more sleek Aesthetic.

    There are many ways to create and display graphs, including a simple URL API for rendering that makes it easy to embed graphs in web pages. This allows for easy sharing between teammates who can make adjustments and pass back the new URLs which are immediately loaded into the composer, allowing for quick discussion and comparison.

    MetricFire specializes in monitoring systems by using both Graphite and Grafana as a service.

    Functions

    Graphite provides a comprehensive library of statistical and transformative rendering functions capable of manipulating series data streams into critical gauges of system activity. One of the most prominent features of Graphites render API is the ability to chain functions together, allowing engineers to compose deep levels of granularity.

    As each series on a chart associates as a stream of data, it is possible to pipe the output of one processing function into the next, combining the piped and nested function. An example as shown below:

    sumSeries(stats_global.production.counters.api.requests.*.count)|scaleToSeco nds(60)|movingAverage(30)|alias('api.avg')

    There are always cases when a custom function is necessary; every production system has its quirks and anomalies. It is possible to add custom processing functions to the Graphite API. Custom functions are packaged as python modules and are loaded by Graphite when placed in the /opt/graphite/webapp/graphite/functions/custom folder. More information on writing and using custom functions is available here.

    The following is a simple example of replacing underscores in Metric Names:

    from graphite.functions.params import Param, ParamTypes
    
    
    
    def  formatHostLegend(requestContext, seriesList):
            """Custom function that prints a pretty-fied legend name"""
            for series in seriesList:
                    pos = series.name.find(".perfdata")
                    first = series.name[0:pos]
                    second = series.name[pos:]
    
    
    
                    series.name = first.replace('_', '.') + second
    
    
    
            return series list
    
    
    
    # Define group
    formatHostLegend.group = 'Custom'
    
    
    
    # Define parameters for the callback
    formatHostLegend.params = [
            Param('seriesList', ParamTypes.seriesList, required=True)
    ]
    
    
    
    # Register the callback function
    SeriesFunctions = {
            'formathostlegend': formatHostLegend
    }

    Conclusion

    Old and new school engineers love Graphite, few monitoring tools are as malleable. The evidence lies in the diverse range of companies using the tool in their production systems - Twitch, Etsy, Github and SendGrid, to name a few.

    However, these teams have experts who know Graphite inside out, and they know how to tune this tool to merge and mutate with their current systems. Most organisations do not have the resources or expertise to do this.

    This is where MetricFire can help. We can provide this expertise for your team and deliver a fully hosted Graphite solution tailored to the needs and nuances of your system. Your team will not have to worry about scalability, releases, plugins, maintenance, tuning or backups. Everything will work out of the box tailored to your needs with 24/7, 365 continuous automated monitoring from around the world.

    We took the best parts of open-source Graphite, and supercharged them. We also added everything that is missing in vanilla Graphite: a built-in agent, team accounts, granular dashboard permissions, and integrations to other technologies and services like AWS, Heroku, logging tools and more.

    If you would like to learn more aboutGraphite monitoringyou canbook a demowith us, or sign up for thefree trialtoday. 


    Get similar stories in your inbox weekly, for free



    Share this story:
    metricfire
    MetricFire

    MetricFire provides a complete infrastructure and application monitoring platform from a suite of open source monitoring tools. Depending on your setup, choose Hosted Prometheus or Graphite and view your metrics on beautiful Grafana dashboards in real-time.

    How to Scale End-to-End Observability in AWS Environments

    Latest stories


    How ManageEngine Applications Manager Can Help Overcome Challenges In Kubernetes Monitoring

    We tested ManageEngine Applications Manager to monitor different Kubernetes clusters. This post shares our review …

    AIOps with Site24x7: Maximizing Efficiency at an Affordable Cost

    In this post we'll dive deep into integrating AIOps in your business suing Site24x7 to …

    A Review of Zoho ManageEngine

    Zoho Corp., formerly known as AdventNet Inc., has established itself as a major player in …

    Should I learn Java in 2023? A Practical Guide

    Java is one of the most widely used programming languages in the world. It has …

    The fastest way to ramp up on DevOps

    You probably have been thinking of moving to DevOps or learning DevOps as a beginner. …

    Why You Need a Blockchain Node Provider

    In this article, we briefly cover the concept of blockchain nodes provider and explain why …

    Top 5 Virtual desktop Provides in 2022

    Here are the top 5 virtual desktop providers who offer a range of benefits such …

    Why Your Business Should Connect Directly To Your Cloud

    Today, companies make the most use of cloud technology regardless of their size and sector. …

    7 Must-Watch DevSecOps Videos

    Security is a crucial part of application development and DevSecOps makes it easy and continuous.The …