Complete guide to ELK Stack used for log analysis along with its architecture, advantages, complete installation process, etc:
The last decade has witnessed a rapid surge in the adoption of a variety of monitoring tools by organizations. Every organization has a wide range of tools to choose from, depending on their specific need. Some of these monitoring tools are used for niche segments like storing and analyzing logs.
In this article, we will discuss one such tool, which is called ELK Stack. This is a popular tool that has been adopted by many organizations for log analysis, which is an important aspect of the day-to-day troubleshooting of applications.
Table of Contents:
What is ELK Stack
Every application generates logs and these logs help us determine if the performance of the application is as expected. ELK stack helps to keep centralized logs that can be analyzed to identify performance gaps of applications or servers and for general monitoring.
Let us begin by understanding more about the ELK stack.
ELK is an acronym that is used for three open source projects, namely Elasticsearch, Logstash, and Kibana.
While Elasticsearch is an engine used for full-text search and analytical purposes, Logstash is a powerful log aggregator. It collects data from various sources and transforms it before sending it to Elasticsearch. Kibana enables users with data visualization in form of charts and graphs in Elasticsearch.
To explain it in simple words, initially, Elasticsearch was used for logs. Further, there was a need for ingestion of data and to have a visual representation of logs. This need was met by Logstash which is a powerful pipeline used for ingestion. Kibana is a virtualization tool that helps with the virtual depiction of logs.
A fourth project, called Beats, was recently added to this mix, and the stack was subsequently called Elastic Stack. Beats project sources and transmits data from different machines and sends it to the stack.
Each of these projects is independent projects and are run by Elastic however, they were initially designed as an integrated solution for log analysis. With the introduction of Beats, this stack is now a four-legged project.
ELK Stack Architecture
A simple ELK stack architecture includes the following:
- Logs: From the server, logs have to be identified and analyzed.
- Logstash- This project collects logs and data for events. It also transforms data.
- Elasticsearch- The data transformed by Logstash is stored, searched, and indexed.
- Kibana- The database of Elasticsearch is used to explore, visualize and share logs.
As mentioned above, the introduction of Beats was for data collection, and subsequently, Elastic changed the name of ELK to Elastic stack.
The below image shows the basic architecture of Elastic Stack:
This is a simple architecture of the ELK stack which is used by smaller organizations. In case of a large amount of data, the structure may involve multiple application server machine logs and Logstash.
This complex structure is depicted in the image below:
Why Is ELK Stack popular
The reason for the popularity of the ELK stack is undoubtedly the fact that it fulfills the requirement of log management and analytical tool. It allows engineers to manage the challenging task of monitoring applications and the IT environment with ease.
It also gives users a centralized platform for the collection and processing of data that is sourced from multiple sources. This data is stored in one single database, which is scalable to store additional data. It also has analytical tools for the analysis of data.
Another important reason for the popularity of ELK Stack is that it is open-source. It provides cost benefits to organizations by avoiding vendor lock-in. Open source also allows for being a part of an innovative community constantly driving new features.
ELK stack also competes with market leaders like Splunk and is extremely popular with smaller companies. Splunk is known for its advanced features, which is an expensive affair for smaller companies to afford. ELK stack is a simple tool with strong log management and analytical features offered at a much lower cost.
Before we look at a comparison between ELK and Splunk, let us get an idea about Splunk.
What Is Splunk
Splunk has three main components, namely, the indexer, forwarder, and search head. The job of the forwarder is to gather data from multiple sources and then this raw data is passed on to the indexer. Indexers store this data and turns it into events. It also generates metadata files which enable the search head to execute user queries.
The image below depicts the architecture of Splunk:
ELK Stack vs Splunk
Let’s look at a simple comparison between ELK stack and Splunk in the table below:
Point of Comparison | ELK Stack | Splunk |
---|---|---|
Ease of set up | Less simple when compared to Splunk | Simple set up and user friendly |
Post set up process | Knowledge of scripting language like Bash, Ruby, and Python is required for set up. | Easy to set up forwarders and data pipeline because of pre- configured templates |
Visualization of data | Web based user interface but does not have individualized user management | Web based interface and user has one or more visualization. |
Search Functionality | Apache Lucene query syntax is used as search language. It is easier than SPL. | Uses Splunk Search Processing language. It is not very easy. |
Additional Features | Data aggregation feature is not very easy to use and is less advanced. Pre-configuration is important to use aggregation of machine log. | Built in data exploration function. Users are able to extract everything from raw data |
Cost and Source | Open Source. No user licensing cost | Commercial tool. Expensive and closed source |
Learning Curve | Flat learning curve as compared to Splunk. Being an open source project, online resources are in abundance. | Moderate learning curve. Trail period offers extensive and useful documentation. |
Loading Data | Logstash does not support all forms of data. Requires Plugins to work with these data types. | Accepts data in any format. For example- csv or any other format of logs. |
Rate of new release and updates | Quick and frequent releases. | Quarterly release cycle |
Latest version | Available at 6.4 version | Available at 7.1 version |
Community Support | Communities offer support but risk of data confidentiality | Great community support. Splunk license gives access to developer communities and enterprise support. |
Integration and Plugins | Supports many plugins. Low integrations as compared to Splunk. Logstash has only 160 integrations. | Integrates well other tools. It offers more than 1000 add-ons and apps. |
Vendor lock-in | Lack of many functionalities attracts additional cost of developing and maintaining these advanced features. | The product is offered as a bundle of benefits. There may be vendor lock-in but one vendor is enough to do everything. |
Portability | Solaris Portability is not offered because of Kibana | Solaris Portability is offered |
Speed of Processing | Limited speed of processing | Processing is accurate and quick |
Architecture and12AA support | ELK technology is a combination of Elastic search, LOgstash and Kibana | Tool is proprietary and offers on- premise and cloud solutions. |
Just like every coin has two sides, ELK stack also has some advantages and disadvantages.
ELK Stack: Advantages And Disadvantages
Advantages of ELK Stack are:
- ELK yields the best results when logs from all applications are sent together to one instance of ELK. The insights thus released from this instance reduces the dependency on multiple log data sources.
- It provides quick and rapid on-premise installation.
- There are several language clients offered by Elastic. Ruby, Python, PHP, Perl, and .NET are a few examples. This is beneficial to those users who have different languages in the codebase and want to use Elasticsearch from all those languages.
- Libraries pertaining to various programming and scripting languages are available.
- It is available as a free open source tool.
- It provides centralized logging. This enables users to collect logs even from the most complex cloud environment into one single index, which is searchable. This makes correlation and comparison of logs and data pertaining to events sourced from multiple sources possible.
- Data analysis and visualization is a real-time process. The benefit of agility and quick decision-making are reaped when data is visualized in real-time.
Disadvantages of ELK Stack are:
- In a complex setup or for large organizations, managing different components of the ELK stack may be difficult.
- Although, ELK stack is an open-source tool, the only simple part of the complete installation process is to download the tool. The process of deployment and configuration is lengthy and tedious. It also becomes more complicated for organizations lacking resources and skills for deployment. Such organizations will have to incur the additional costs of a training program or hiring an ELK stack professional who can manage the process of deployment.
- Some users of ELK Stack have reported issues pertaining to stability and uptime, which can get worse with increased volumes of data.
The above-mentioned disadvantages clearly explain that while an organization may take deployment and management of ELK stack up on their own, a preferable option would still be to use the services of expert developers or DevOps engineers.
This team of specialists not only develop innovative solutions and applications but also seamlessly manages tedious tasks from installation to monitoring activities.
In most situations, achieving targets of maintaining security and compliance and also scaling up and down to meet the dynamic needs of the business can be a challenge in case an organization manages the ELK stack on its own.
The solution to this problem is Amazon Elasticsearch Service. Let us know more about Amazon Elasticsearch Services. It is also referred to as ELK Stack AWS.
Amazon Elasticsearch Service
Amazon Elasticsearch Service is a managed service. It is aimed at simplifying the process of deployment and operation of Elasticsearch clusters in the AWS cloud. It provides cost-effective ways to users for search, analyzing, and visualizing log data.
The setup and management of Elasticsearch clusters can be a daunting challenge. Amazon Elasticsearch service allows users to devote more time to developing the application rather than managing it. The entire process of creating Elasticsearch clusters that are scalable and secure can be easily done with a few clicks on the AWS console.
Amazon Elasticsearch Service provides a bundle of benefits, which include open source Elasticsearch APIs, and well-managed Kibana. Logstash also integrates with other AWS services. This allows the user the flexibility to use existing tools.
Amazon Elasticsearch Service is easy to get started with using the AWS Free Tier. In order to use AWS Free Tier for the creation and configuration of the Amazon Elasticsearch domain, the user needs to sign up and log into an AWS account. Amazon Elasticsearch Service also offers hands-on practice in the hands-on lab.
Amazon ES supports the following versions of Elasticsearch:
- 7.10,7.9,7.8,7.7, 7.4, 7.1
- 6.8, 6.7, 6.5, 6.4, 6.3, 6.2, 6.0
- 5.6, 5.5, 5.3, 5.1
- 2.3
- 1.5
Elasticsearch versions 7.x and 6.x have powerful features, thereby rendering fast, highly secure, and simple to use.
The basic architecture of the Amazon Elasticsearch Service is shown in the image below:
[image source]
Now, it is time for us to understand the process of installation of the ELK stack.
Installation Of ELK Stack
There are several methods and a wide array of operating systems available for the installation of the ELK stack. Some of these include local installation of ELK, installation on the cloud, installation using Docker, and management systems like Ansible, Puppet, and Chef. It is possible to install the stack using a tarball, with.zip packages, or from repositories.
While some steps may differ depending on the environment, most steps remain the same. In this article, we will look at the process of installation of all the components of the ELK stack on Linux.
Some prerequisites for the environment include setting up a single AWS Ubuntu 18.04 machine. This needs to be set up on an m4.large instance and the local storage needs to be used. An EC2 instance is to be started in the public subnet of a VPC and then the firewall is set up using SSH and TCP 5601(Kibana) to ensure access is available.
A new IP address is added, and it needs to be associated with the instance already running so that a connection with the Internet can be established. It is also important to note that this installation is for version 6.2. The latest versions have changes with respect to licensing models and the basic X-pack features are included in the installation package.
Below is a step-wise process of installation of components of ELK Stack:
We first look at the installation of Elasticsearch.
Installation Of Elasticsearch
Step 1: To verify the downloaded package, add Elastic’s signing key. This step can be skipped in case the packages are already installed from Elastic.
Step 2: In the case of Debian, it is important to install the apt-transport- https-package.
Step 3: Add repository definition to the system.
Step 4: Install the version of Elasticsearch which has only the features which are licensed under Apache 2.0, which is also called OSS Elasticsearch.
Step 5: Update repositories and install Elasticsearch.
For the purpose of the configuration of Elasticsearch, a configuration file that enables configuration of general settings like nodes and network settings like port, host, storage location of data, memory, log files, etc. is used.
In the example, we have considered installing Elasticsearch on AWS and therefore the best thing to do is to bind Elasticsearch to a private IP or localhost.
Step 6: Run Elasticsearch.
Step 7: For a final confirmation, browser or curl needs to be pointed to http://localhost:9200.
This completes the process of installation of Elasticsearch.
Let us now look at the steps to install Logstash.
Installation Of Logstash
=> Official website of Logstash
The prerequisite for the installation of Logstash is to run either Java 8 or Java 11.
Step 1: Confirm if Java is installed.
Step 2: Repository is already defined. For installation of Logstash, we need to run the command:
sudo apt-get install logstash
In order to run Logstash, data pipeline also needs to be installed. This is explained in the later section, after installing Kibana.
Let us now look at the installation of Kibana.
Installation Of Kibana
Step 1: Use the apt command to install Kibana.
The command is
Sudo-apt-get install Kibana
Step 2: Open the Kibana configuration file. Type /etc/kibana/kibana.yml. It is also important to ensure below-mentioned configuration is defined. This configuration enables Kibana to decide which Elasticsearch needs to be connected to and which port is to be used.
server.port: 5601 elasticsearch.url: http://localhost:9200
Step 3: Enter the command: sudo service kibana start and start Kibana.
Step 4: Open the browser and type http://localhost:5601. This will open the homepage of Kibana.
Installation Of Beats
Step 1: Install Metricbeat by typing- sudo apt-get install metricbeat
Step 2: To start metricbeat, type- sudo service metricbeat start
This will start the process of monitoring the server and Elasticsearch index will be created, which can be defined in Kibana.
Now we can continue with creating Logstash configuration with the following steps:
Step 1: Logstash configuration file is to be created at /etc/logstash/conf.d/apache-01.conf
Step 2: Enter the Logstash configuration.
Step3: Start Logstash. Type- sudo service logstash start. This will start a new Logstash index in Elasticsearch and its pattern can be defined in Kibana.
Step 4: On the Kibana homepage, click on Management and then click on Kibana Index Patterns. Kibana will display Logstash index and Metricbeat index.
Step 5: Type “logstash-*” as index pattern.
Step 6: Select @timestamp in the Time Filter Field.
Step 7: Click on Create index pattern.
The installation is now ready for data analysis. To look at the data, click on Discover tab in Kibana.
This completes the process of set up of ELK data pipeline where we used Elasticsearch, Logstash and Kibana.
ELK Stack Use Cases
Let us now look at some examples where ELK stack has been successfully used.
#1) Accenture
Accenture is a leading IT company in the world and has also led projects on ELK implementation. As an organization, they stated ELK stack, being open-source software, is preferred over Splunk. Other factors stated by the organization for preferring ELK stack is the simplicity of interface and use of plugins which help in extension of functionalities.
#2) Netflix
Netflix is yet another company that depends on the ELK stack for monitoring and analyzing customer operations and logs related to security. Elasticsearch is used for sharding and replication, which is automated. Apart from this, the company also benefits from features like flexible schema, extension models, and multiple plugins.
Netflix’s extensive use of Elasticsearch has expanded from a handful of isolated deployments to over 15 clusters with approximately 800 nodes. It is the cloud database engineering team that centrally manages the setup.
#3) LinkedIn
LinkedIn uses ELK for monitoring performance and security as the business is based on the social network. There is an integration with Kafka which helps the IT team in managing load real-time. The company’s Elk operation spreads to over 100 clusters across 6 data centers and over 20 teams.
#4) Tripwire:
Tripwire is the world leader in Security Information Event Management (SIEM). This company uses ELK in supporting the analysis of an information packet log.
Let us now look at some frequently asked questions regarding ELK Stack.
Frequently Asked Questions
Q #1) Is ELK stack free?
Answer: ELK stack is free software however the process of setting up and maintenance of ELK stack needs resources and infrastructure. The cost of deployment for on-premise or on-cloud depends on the volume of log aggregation from applications.
Q #2) Is ELK a SIEM (Security Information and Event Management)?
Answer: When the ELK stack is in the raw form and comprises Logstash, Kibana, Elasticsearch, and Beats, it cannot be used as SIEM (Security Information and Event Management) solution. ELK stack is a powerful tool for logging, yet, it is not a SIEM solution.
Q #3) What are the different operations that can be performed on the document using Elasticsearch?
Answer: The different type of operations that can be performed on Elasticsearch includes:
- Indexing
- Updating
- Fetching
- Deleting
Q #4) Can Kibana be used without Elasticsearch?
Answer: No, Kibana cannot be used for the display of data without Elasticsearch.
Conclusion
In this article, we have explained the ELK stack, and how to set up the ELK stack. We have included detailed steps on how to install the ELK stack and its components.
In order to make a choice for using ELK stack as a solution for log management, an organization needs to carefully analyze its resources, infrastructure, and skills else there are specialized solutions like Amazon Elasticsearch Services, which can also be used.
The cost also depends on the volume of log data that is aggregated. Therefore, it is important for an organization to weigh the costs and other factors before deciding.