Logging is a crucial function to monitor and provide observability and insight into the activities of an application in distributed systems like Kubernetes. We’ve curated some of the best tools to help you achieve this, alongside a simple guide on how to get started with each of them.
Modern applications are complex. These applications use containers and microservices and are deployed on large-scale distributed systems such as Kubernetes. A microservices application, for example, will have several applications running in the Kubernetes cluster. Observing and collecting log data from such applications is very challenging. Collecting log data from such complex applications to a central platform helps ease the burden and make the log data useful.
Why do you need log data?
There are several beneficial uses of Kubernetes log data, but the most common one is debugging. Logging helps you correlate information that will help you debug your application when there are any issues.
Log data is also helpful for business intelligence and detecting suspicious activities.
You also need log data if you're in an industry that requires you to comply with a particular policy related to the data from your application.
Log data from different applications in your Kubernetes cluster are, however, scattered and, for this reason, not usable for the purposes mentioned above. Also, collecting logs from Kubernetes pods using the native kubectl is ineffective because it stores log in at a default size of 10MiB per pod. This size is large and is not easy to persist over long as applications keep generating more logs.
To make the extensive log data from your Kubernetes applications useful, you need to collect and aggregate them in a well-structured format. This can be done using log management tools tailored to fetch, correlate and present Kubernetes log data in an interactive interface for further analytical and technical usage.
There are many logging tools in the DevOps space, but a few of them are suitable for Kubernetes. Some of these tools are highlighted in this blog.
Graylog is a robust log management platform that collects, query and visualizes valuable log data in an interactive web interface. It collects data from single or multiple servers as well as Kubernetes clusters. Graylog supports multiple logging protocols such as Kafka, Netflow, Beats, and AWS Logs.
Pros and Cons of Graylog
Pros of Graylog
Graylog collects and presents Kubernetes log data in a friendly and interactive graphical user interface. This aids easy understanding and analysis of the collected log data. It also supports a wide range of data formats, and you can configure Graylog to send log alerts via emails.
Cons of Graylog
The major disadvantage of Graylog is index rotation and functionality reports. Graylog offers a poor service in reporting functionality.
How to use Graylog in Kubernetes
Setting up Graylog requires four main components; Elastisearch, MongoDB (the database used to store configurations and metadata), Graylog main server (which receives data from different sources), and the Graylog web interface.
Fluentd is an open source log collector built by Treasure Data and is under the cloud native computing foundation, CNCF. It reliably collects log data from different sources, correlates, and converts them into a uniform format for better insight.
Pros and Cons of Fluentd
Pros of Fluentd
Fluentd uses tags to route log events. It also gives you the flexibility to send collated logs to other storage or visualization platforms.
Fluentd saves every collected data on the hard drive until it is sent to the target destination. This helps it persist data in case of a pod failure or any other issues.
Fluentd does not require any additional storage configuration because it automatically persists data until it is sent out.
Cons of Fluentd
Fluentd is an open source project written in C, while many of its plugins are written in ruby. Even though this increases flexibility, at the same, it reduces the speed at which Fluentd can process events. Also, setting up Fluentd may be complicated considering that you need to configure both the host cluster's nodes and the Fluentd itself to direct logs to your desired channels.
How to use Fluentd in Kubernetes
To use Fluentd in Kubernetes, you need to install it as a Kubernetes daemonset, making it run on each node in the Kubernetes cluster. You then need to configure it using the Fluentd configuration file. The configuration file is, however, complex, but you can quickly get it done using plugins. Fluent provides a vast number of plugins to support its configuration.
Pros and Cons of Loki
Pros of Loki
Loki is a log aggregator system that scales horizontally. It integrates seamlessly with multiple platforms, including Docker, Kubernetes, Grafana, Helm, and Percona.
Finally, it simplifies storage by storing logs as plain texts and only indexing the metadata, thereby providing significant savings on storage costs.
Cons of Loki
The Loki stack does not allow you to configure the log collection component from the UI, even though it can generate metric-based logs. It is also not effective for complex entries and can only process simple search queries.
How to use Loki in Kubernetes
Loki uses a combination of three stacks to provide a complete log aggregation system.
- Promtail: the logging agent responsible for collecting and gathering logs. After which, it sends it to Loki.
- Loki: Loki is the central server of the logging system. It is responsible for storing logs and processing all queries.
- Grafana: This is a visualization system used with tools like Prometheus. Grafana is used in the Loki-based logging system to query and visualize logs easily.
To use Loki in Kubernetes, you need to install the loki-stack in a dedicated Kubernetes namespace by running kubectl create namespace loki. After which, you add the Loki Helm chart from the repository:
$ helm repo add loki [<https://grafana.github.io/loki/charts>](<https://grafana.github.io/loki/charts>)
Finally, you deploy the Loki stack by running:
$ helm upgrade --install loki loki/loki-stack --namespace=loki --set grafana.enabled=true
This will install all three components (promtail, Loki, and Grafana) of the Loki stack in your Kubernetes cluster.
ELK (Elastic Stack)
ELK is a stack of three open source projects: Elasticsearch, Logstash, and Kibana.
These three projects work similarly to the Loki stack.
Elasticsearch acts as an analytics and search engine; Longstash collects and processes data from multiple sources and sends it to Elasticsearch while Kibana visualizes the log data for easy usage.
Jointly, these three tools provide a full-featured Kubernetes log collection, querying, and presentation.
Pros and Cons of ELK
Pros of ELK
The ELK stack is an open source log management tool that you can run locally. It also handles complex queries and supports complex alerts configuration using third-party tools. Even with just IP geolocation, you can build a comprehensive dashboard using the Kibana component.
Cons of ELK
The Elasticsearch component is resource-intensive, and it is not easy to configure Logstash. The feature for authorizing users is also only available in the paid subscription plan.
How to use ELK in Kubernetes
To use the ELK stack in Kubernetes, you must deploy individual components of the ELK stack (Elasticsearch, Longstash, and Kibana) one after the other. You can deploy each component manually using kubectl or through Helm charts and configure them to achieve your desired setup.
Apart from the famous and full-featured Kubernetes logging tools mentioned above, many other open-source tools also provide basic logging features for Kubernetes clusters. Some of them include;
An acronym for Kubernetes tail, kail is an open source that scrapes and matches logs from all pods in a Kubernetes cluster. It allows you to query and display old records by specifying a duration. The project is written in Golang and has 1.3k stars and 14 contributors on GitHub.
log-pilot is an easy-to-use logging tool, not for Kubernetes but Docker containers. It allows you to collect logs from different docker hosts and send them to a centralized logging system such as Graylog and Elasticsearch.
loghouse is an excellent log management solution for Kubernetes with an intuitive web UI. loghouse was created to collect Kubernetes logs, store them in the ClickHouse database and allow you to query and monitor your logs in a web interface. loghouse has 22 contributors and more than 800 stars on GitHub.
Get similar sotries in your inbox weekly, for free
Share this story with your friends
The team behind this website. We help IT leaders, decision-makers and IT professionals understand topics like Distributed Computing, AIOps & Cloud Native