AIOps is the use of artificial intelligence to make IT operations management simple, accelerate the time to solve IT operations problems by automating their resolution. This post lists the most popular AIOps open source tools.
The ranking is based on the number of stars received on Github for each repository.
We have noticed that the majority of the open source AIOps projects, whether listed here or not, are using Python. The fact that Python is the first programming language used in machine learning on Github, explains what we found.
Let's get to the point.
Seldon Core (1.8k stars)
Seldon core converts machine learning models (e.g. Pytorch, Tensorflow, H2o) or language wrappers (Python, Java, etc.) into production REST/GRPC microservices. It handles scaling to thousands of production machine learning models and provides advanced machine learning capabilities out of the box including:
- Advanced Metrics,
- Request Logging,
- Outlier Detectors,
- A/B Tests,
- Canaries and more.
Loglizer (700 stars)
Loglizer provides a toolkit that implements a number of machine learning based log analysis techniques for automated anomaly detection. The log analysis framework for anomaly detection usually comprises these 4 components:
- Log collection: Logs are generated at runtime and aggregated into a centralized place with a data streaming pipeline, such as Flume and Kafka.
- Log parsing: Converts unstructured log messages into a map of structured events, based on which sophisticated machine learning models can be applied.
- Feature extraction: Structured logs can be sliced into short log sequences through an interval window, sliding window, or session window. Then, feature extraction is performed to vectorize each log sequence, for example, using an event counting vector.
- Anomaly detection: Anomaly detection models are trained to check whether a given feature vector is an anomaly or not.
AIOpsTools (190 stars)
AIOpsTools is a toolkit for Python developers who want to use existing features to build AIOps applications. Aiopstools realizes some Ops scenes by using artificial intelligence. You can import modules easily to achieve functions. This toolkit provides these 4 main capabilities:
This tool has no/incomplete English documentation; we hope to see the English version soon!
Log Anomaly Detector (130 stars)
Log anomaly detector (LAD) is an open source project code named "Project Scorpio". It can connect to streaming sources and produce predictions of abnormal log lines. Internally it uses unsupervised machine learning. LAD developers incorporated a number of machine learning models to achieve this result. In addition, it involves a human in the loop feedback system.
This project contains the following 3 components:
- LAD-Core: Contains custom code to train model and predict if a log line is an anomaly.
- Metrics: Grafana and Prometheus to visualize the health of this machine learning system.
- Fact-Store: A metadata registry for tracking feedback from false positives in the machine learning system providing a method for ML to self-correcting false predictions.
Log3C (120 stars)
Log3C is a general framework that identifies service system problems from system logs. It utilizes both system logs and system KPI metrics to promptly and precisely identify impactful system problems. Log3C involves four main steps:
- Log parsing,
- Sequence vectorization,
- Cascading Clustering and Correlation analysis.
WhyLogs Library (120 stars)
WhyLogs Library is an implementation of WhyLogs Java Library. It is an open source tool providing statistical logging library and enabling data science and machine learning teams to effortlessly profile ML/AI pipelines and applications, producing log files that can be used for monitoring, alerts, analytics, and error analysis.
These are the 5 main features WhyLogs offer:
- Data Insight: Complex statistics across different stages of ML/AI pipelines and applications.
- Scalability: Scales with the system, from local development mode to live production systems in multi-node clusters, and works with batch and streaming architectures.
- Lightweight: Produces small mergeable lightweight outputs in a variety of formats, using sketching algorithms and summarizing statistics.
- Unified data instrumentation: To enable data engineering pipelines and ML pipelines to share a common framework for tracking data quality and drifts.
- Observability: WhyLogs data can support advanced ML-focused analytics, error analysis, and data quality and data drift detection.
You can find some WhyLogs examples here.
Jumbune (60 stars)
Jumbune, an open source Big Data APM, provides deep analytics on Big Data to improve the performance of Data operations on public clouds: AWS, Azure, Google Cloud Platform, and In-premise data centers.
It comprises various modules, namely, Hadoop Job Flow Analyzer, HDFS Data Validator, and Hadoop Job Profiler and Cluster Monitoring.
Jumbune architecture can be classified into the following major blocks:
- Request Handler
- Jumbune Agent
AIOps: A Promising Technology and a Growing Market
A study by Gartner shows that only five percent of big companies combine big data and machine learning. But by the year 2020, it will be about two-fifths. Another study conducted by Research and Markets found that AIOps is witnessing an impressive growth worldwide, and the market would reach $9.907 billion by 2023. According to OpsRamp, a primary service-centric AIOps platform provider revealed that AIOps tools generate value for 87% of organizations.
The AIOps market is projected to register an upward trend at a CAGR of 27% during the forecast period (2019-2024).
According to MarketsandMarkets, the global market for AIOps platforms will grow from $2.55 billion in 2018 to $11.02 billion by 2023 (average annual growth of 34%). Both industry giants and niche specialized companies compete on it - so far, everyone has enough space. However, the mergers and acquisitions have already begun, and not all will survive until 2023 - at least as independent companies.
The greatest promise of AIOps lies in its ability to automatically detect, analyze, and even fix IT issues in real-time.
Get similar stories in your inbox weekly, for free
Share this story with your friends
The improved AWS feature allows users to trigger Lambda functions from an SQS queue.
United States Defense Department Asks Amazon, Google, Microsoft, and Oracle to Bid on the JWCC Program
DoD looking to entrust cloud security to multiple vendors.
Google makes fuzzing easier and faster with ClusterFuzzLite
HTTP-based autoscaling and scale to zero capability on a serverless platform