Kubernetes: 7 Open Source Logging and Tracing Tools You should Try

According to management thinker Peter Drucker "You can't manage what you can't measure". You may apply this to several disciplines and domains. Measurement is the key to understanding especially in complex environments like distributed systems in general and Kubernetes specifically.

Wikipedia defines observability as a measure of how well internal states of a system can be inferred from knowledge of its external outputs. The term observability is usually referred to in Control theory and it becomes widely adopted by DevOps and Cloud-Native communities.

Google defines Observability as a tooling or a technical solution that allows teams to actively debug their system. Observability is based on exploring properties and patterns not defined in advance and it has three pillars: Logging, Metrics, and Distributed Tracing. Observability is an excellent way to give your team super measurement capabilities, and in this post, we are going to focus on two of its pillars: Logging and Tracing. So, what are some open source tools we can use?

Jaeger - a Distributed Tracing System

Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing platform created by Uber Technologies and donated to Cloud Native Computing Foundation. It can be used for monitoring microservices-based distributed systems:

Distributed context propagation
Distributed transaction monitoring
Root cause analysis
Service dependency analysis
Performance / latency optimization

Kiali

Kiali provides answers to the questions: What microservices are part of my Istio service mesh and how are they connected? Kiali works with Istio to visualize the service mesh topology, features like circuit breakers or request rates.

Kiali also includes integration with Jaeger Tracing to provide distributed tracing out of the box. You can embed Kiali in other applications as the tool offers a simple feature called Kiosk mode. In this mode, Kiali won’t show the main header, nor the main navigation bar.

Fluent Bit

Fluent Bit is a Log Processor claimed to be fast. It's also a Log Forwarder. This tool works with Linux, Embedded Linux, MacOS and BSD family operating systems and runs on x86_64, x86, arm32v7, and arm64v8 architectures.

It's part of the Fluentd Ecosystem and a CNCF sub-project. Fluent Bit allows to collect log events or metrics from different sources, process them, and deliver them to different backends such as Fluentd, Elasticsearch, NATS, InfluxDB, or any custom HTTP end-point within others.

In addition, Fluent Bit comes with full Stream Processing capabilities: data manipulation and analytics using SQL queries.

Prometheus

Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.

The features that distinguish Prometheus from other metrics and monitoring systems are:

A multi-dimensional data model (time series defined by metric name and set of key/value dimensions)
PromQL, a powerful and flexible query language to leverage this dimensionality
No dependency on distributed storage; single server nodes are autonomous
An HTTP pull model for time series collection
Pushing time series is supported via an intermediary gateway for batch jobs
Targets are discovered via service discovery or static configuration
Multiple modes of graphing and dashboarding support
Support for hierarchical and horizontal federation

Loki: like Prometheus, but for logs.

Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It is descibed as cost effective and easy to operate.

Loki does not index the contents of the logs, but rather a set of labels for each log stream. Compared to other log aggregation systems, Loki:

does not do full-text indexing on logs. By storing compressed, unstructured logs and only indexing metadata, Loki is simpler to operate and cheaper to run.
indexes and groups log streams using the same labels you’re already using with Prometheus, enabling you to seamlessly switch between metrics and logs using the same labels that you’re already using with Prometheus.
is an especially good fit for storing Kubernetes Pod logs. Metadata such as Pod labels is automatically scraped and indexed.
has native support in Grafana (needs Grafana v6.0).

A Loki-based logging stack consists of 3 components:

promtail is the agent, responsible for gathering logs and sending them to Loki.
loki is the main server, responsible for storing logs and processing queries.
Grafana for querying and displaying the logs.

Loki is like Prometheus, but for logs: we prefer a multidimensional label-based approach to indexing and want a single-binary, easy to operate a system with no dependencies. Loki differs from Prometheus by focusing on logs instead of metrics, and delivering logs via push, instead of pull.

ELK - Elasticsearch, Logstash, Kibana

Elasticsearch is an open source, distributed, RESTful search engine.

Logstash is a tool that transports and processes your logs, events, or other data.

Kibana is a browser-based analytics and search dashboard for Elasticsearch.

These tools work together and represent a reliable solution used for Kubernetes monitoring and log aggregation.

EFK - Fluentd, Elasticsearch, Kibana.

Same as ELK, just change ElasticSearch by Fluentd.

Fluentd collects events from various data sources and writes them to files, RDBMS, NoSQL, IaaS, SaaS, Hadoop, and so on. Fluentd helps in unifying logging infrastructure.

Fluentd has a UI called Fluent UI used to start/stop/configure Fluentd, but it can also integrate with other tools like Kibana.

ELK vs EFK

ELK and EFK stacks are both based on Logstash and Kibana, but the fact that ElasticSearch and Fluentd are different makes these two stacks useful for different use cases. For instance:

Logstash uses algorithmic statements for event routing, while Fluentd uses Tags.
Logstash has a centralized plugin ecosystem and Fluentd has a decentralized one.
Logstash reliability may be enforced using Redis while Fluentd has a built-in reliability mechanism.
Usually, Logstash uses more memory than Fluentd.

Get similar stories in your inbox weekly, for free

Share this story:

The Chief I/O

The team behind this website. We help IT leaders, decision-makers and IT professionals understand topics like Distributed Computing, AIOps & Cloud Native