This tutorial explains the features, benefits, installation process, etc of Elasticsearch – a search platform with fast searching capabilities:
In this ever-growing world of data, there have been numerous additions of tools that assist in the analysis of data. One name that is synonymous with reliability is Elasticsearch.
The usage of this popular tool ranges from a simple search on websites to the collection and analysis of complex data and visualization. In this article, we will dive deep into the world of Elasticsearch and understand more about it.
Let’s begin.
Table of Contents:
What is Elasticsearch
Let’s keep it simple. Elasticsearch is a search platform with fast search capabilities. It is a Lucene-based search engine that was developed in Java but supports clients in different languages like PHP, Python, C#, and Ruby. It is most useful for full-text search and analysis.
So, what does it do?
It uses unstructured data from multiple sources as input and saves it in a structured format, which proves to be optimum for language-based searches.
As discussed above, the focus of Elasticsearch is on search capabilities and features. It is useful for searching for multiple types of data. It has a distributed architecture that enables the search and analysis of large volumes of data in near real-time.
Its ability to scale from one machine to hundreds of machines sets it apart from many other tools. A full-featured search cluster can be run easily, although it requires high expertise.
Apart from search-oriented uses, Elasticsearch is also useful for storing data that needs grouping by multiple dimensions. Elasticsearch for metrics logs, traces, and many other time-series data are some examples of its analytical uses.
Fundamental Concepts – Elasticsearch
At this stage, it is also important to understand some key concepts. Below is a glossary of a few components of Elasticsearch that will be essential to understand.
#1) Documents: Before we understand “Documents”, let us first look at a commonly used term called JSON. It is a global format for internet data exchange. To understand, we can compare documents to rows in a relational database that represents an entity being searched.
However, in Elasticsearch, documents are not limited to simple texts but go beyond to include structured data that is encoded in JSON. Each document possesses a unique ID and data type. These details are important to determine the data type of the document.
#2) Indices: Multiple documents with similar characteristics form an index. Interestingly, it is also the highest level of the entity against which a query can be run in Elasticsearch. Documents in an index are logically related. The index is represented by a name with which it is identified during indexing and other operations.
#3) Inverted Index: This is the search mechanism based on which the engines work. The mapped data (content to its location in a document) is stored here. It is important to note here that these strings are not stored directly but instead split the document to the level of the specific search item.
The process carries on further to map each of these search items into the documents in which they occur. This allows quick searches of full-text searches for a huge volume of data as well.
Elasticsearch – Backend Concepts
Apart from this, there are a few components of Elasticsearch which are hidden or can be referred to as backend components.
They have been enlisted below:
- Cluster: A cluster refers to a group of multiple nodes which are connected together. This is where Elasticsearch distributes tasks, searches, and indexes all the nodes in a cluster.
- Node: A node is a single server in a cluster. It is the node where data is stored and the process of indexing and searching for a cluster takes place. For Elasticsearch, nodes can be configured in many ways.
- Master Node: This node is called the control room for the Elasticsearch cluster as it controls all the operations, such as the creation or deletion of index or addition or deletion of nodes.
- Data Node: This node stores and executes operations related to data such as aggregation of data.
- Client Node: This node sends requests to appropriate nodes. For example, it sends cluster requests to the master node and any data request to data nodes.
- Shards: As mentioned earlier, the index is further divided into multiple pieces, which are called “shards”. Each shard is an independent index and is also completely functional and can be hosted on any given node in a cluster. Documents in an index are distributed to different shards and these shards are sent to different nodes, thus creating redundancy which is very helpful in protecting against hardware failure and data loss. It also enhances the capacity of queries.
- Replicas: Replicas are copies of the primary shard. Every document in an index is a part of one primary shard. As mentioned above, replicas create copies of data to prevent a situation of hardware failure. It also boosts the capacity for responding to requests.
Capabilities
Let’s understand the main capabilities of Elasticsearch:
- Search Engine: One of the unique selling points of Elasticsearch is that it enables powerful full-text search capabilities with ease. This feature was missing in the traditional SQL database management systems, as they lacked the capabilities of a full-text search engine for voluminous data.
- Analytical engine: Elasticsearch also accredits a lot of popularity to its analytical usage. It is popularly used for log analytics, and slicing numerical data like performance matrices. It also enables the aggregation of data (Elasticsearch’s aggregation queries) which boosts data visualization.
- Architectural design enabling scaling: Elasticsearch has an inbuilt capacity to scale to multiple servers because of its distributed architecture. It also has the capability of storing petabytes of data. It is often seen that distributed systems are complex in nature, but not with Elasticsearch. Many decisions are made automatically, ensuring a smooth management API. The scaling ability is much simpler than most other systems. In situations of a node failure, Elasticsearch also replicates data automatically, helping to prevent data loss.
- The right choice of investment: The mechanics of Elasticsearch are easy to understand, especially when the data set involved is small. It has a simple API that integrates well with other tools like Logstash for sending data to Elasticsearch or Kibana for visualization of data. It is a combination of a shorter learning curve and these capabilities that one can begin using Elasticsearch quickly, thereby enhancing productivity.
- Well-documented API: This is yet another feather that has led to its growing popularity. Developers can capitalize on the availability of APIs for integration. Apart from this, Elasticsearch provides compatible client libraries for many programming languages like Java, JavaScript, PHP, etc. making the process of integrating an easy one for developers.
Elasticsearch Working
Elasticsearch’s work is primarily to retrieve and manage semi-structured data. It is an inverted index, managed by Apache’s Lucene’s API that serves as the primary data structure used by Elasticsearch.
You must be wondering what an “inverted index” is? Read further to get your answers!
The mapping of every unique token to a given list of documents that contains that word is an inverted index. This process makes the identification of documents using the given keyword a quick process. There are multiple partitions called “Shards” in which index information is saved.
Elasticsearch is capable of not only distribution and allocation of shards to the nodes in the cluster dynamically but also replicating them. This lends flexibility to the process of data distribution.
The distribution of copies of the primary shards to various cluster nodes provides a redundancy feature. These primary shards are used by index operations whereas while running search queries, both types of shards are used. The performance while running queries is enhanced due to multiple nodes and replicas.
Uses of Elasticsearch
Below are some basic use cases for Elasticsearch:
- Search for applications: This is particularly important for websites that depend on a search platform for accessing, retrieving, and reporting data.
- Search of websites: Elasticsearch plays a very important role in yielding accurate and quick search queries for websites that store huge amounts of data. It has now created a stronghold in the domain of site search.
- Enterprise Search: Elasticsearch also enables search across the enterprise, like document search, E-commerce search for products, etc. It has also become the most trusted search solution for a lot of websites.
- Log Analytics: As discussed earlier, Elasticsearch is a common tool for the analysis of log data in near real-time. Not only this, its scalable capability and vital insights into operations make it a popular choice.
- Security Analytics: Security Analysis is yet another important domain in which Elasticsearch plays a very important role. It analyzes access logs and similar logs related to security systems with the help of the ELK stack showing a complete analysis.
- Business Analytics: There are many in-built features within the ELK stack which also makes it a popular tool for business analytics. However, getting in-depth know-how about the implementation of these tools may take longer.
Benefits
Here are some of the benefits listed:
- High-Performance standards: Elasticsearch can process huge volumes of data simultaneously, thereby giving quick results for search queries.
- Development of applications: It supports multiple programming languages such as Java, Python, PHP, etc. making it a popular choice of developers for the development of applications.
- Quick speed of operation: The operations of Elasticsearch like reading and writing are as quick as a blink of an eye, thereby enabling it to be used for near real-time use cases such as monitoring of the application.
- Quick time to value: Elasticsearch provides simple APIs based on REST and utilizes schema-free JSON documents. This makes it easy to use for the quick creation of applications for many use-cases.
- Complimentary tools: Kibana is a visualization and reporting tool and is integrated with Elasticsearch. Elasticsearch also provides integration with Beats and Logstash, which allows the transformation of source data to be loaded to clusters. There are plenty of plugins available which can enhance the functionality of applications.
Installation of Elasticsearch
It’s time for us to understand the process of installation of Elasticsearch.
#1) On AWS
Here are the steps to install:
Step 1: To verify the downloaded package, add Elastic’s signing key. If the package is already installed from Elastic, one can skip this step and move to the next step.
Official website: Elastic
Step 2: In the case of Debian, it is important to install the apt-transport- HTTPS-package.
Step 3: Add repository definition to the system.
Step 4: Install the version of Elasticsearch which has only the features which are licensed under Apache 2.0, which is also called OSS Elasticsearch.
Step 5: Update repositories and install Elasticsearch.
For the configuration, a configuration file that enables the configuration of general settings like nodes and network settings like port, host, storage location of data, memory, logs files, etc. is used.
For this particular example, we will look at the process of installing Elasticsearch on AWS. Hence, it is recommended to bind Elasticsearch to a private IP or localhost.
Step 6: Run Elasticsearch.
Step 7: For a final confirmation, the browser or curl needs to be pointed to http://localhost:9200.
This completes the process of installation of Elasticsearch.
#2) On Windows
For the installation of Elasticsearch on Windows, .msi package can be used as it enables installing Elasticsearch as a Windows service and also allows it to be run manually with the help of the elasticsearch.exe file.
It is important to note that Elasticsearch can also be installed on Windows by using the .zip archive. It also requires Java 8 or later versions.
Below is the step-by-step process for the installation on Windows:
Step 1: Log on to the webpage and download the package for Elasticsearch v6.8.23.
Step 2: After downloading the .msi package, double click on it and launch GUI wizard. This works as a quick guide to the entire installation process. The ‘?’ button can be used at any given time to seek help.
Step 3: On the first screen, choose the directory for the installation. In this step, also choose directories for storing data, logs, and configuration.
Step 4: In the next step, choose if you want to install elastic search as a service or start it manually as and when required. If it is installed as a service, a Windows account can be configured so that the service can be run on it.
At this step, it is very important to make sure that the Windows account to run the service has enough privileges to access not only the installation but also other directions for deployment.
Step 5: Look at the “Configuration” section to find common configuration settings which allow the name of the cluster, name of the node, and roles that are to be set in along with settings for additional memory and network.
It is also important to make sure that the internet connection is steady and secure on the installation machine and corporate firewalls allow downloads from the link – artifacts.elastic.co
In version 6.3.0, X pack is available as a default bundle.
Step 6: In this last step, choose the type of license that needs to be installed after the security configuration and built-in user configuration has been chosen.
Also, X-Pack includes a trial and a basic license which users can choose from. A trial license carries a validity of 30 days. At the end of the trial period, users will need to get a subscription. The basic license is free.
Step 7: Click on ‘Install’ to initiate the process of installation of elastic search.
This is the end of the process of elastic search.
AWS Elasticsearch
Amazon Elasticsearch Service, or AWS Elastic search, is now called Amazon OpenSearch Service. It is a managed service that simplifies the deployment, operation, and scaling of OpenSearch clusters in the AWS cloud. Both OpenSearch and Legacy Elasticsearch OSS are supported by Amazon OpenSearch.
While creating clusters, users have the option to choose the search engine. In fact, there is broad compatibility between OpenSearch Service and Elasticsearch OSS version 7.10, which is also the final open-source version of this software.
OpenSearch is an open-source search engine that also offers the functionality of an analytical engine for log analytics and real-time monitoring of applications.
Suggested Reading => Exclusive Tutorial on AWS Elastic Beanstalk
OpenSearch accommodates every resource for the cluster and can also launch it. If there are any failed nodes in OpenSearch Service, they are not only detected but also replaced. This leads to reduced overhead, which is linked to infrastructure. The cluster can be scaled with a single API call.
Official Website: Amazon OpenSearch Service
Supported Versions of OpenSearch and Elasticsearch
OpenSearch version 1.1, 1.0 is currently supported by OpenSearch Services.
Here is the legacy Elasticsearch OSS versions supported by OpenSearch Services:
- 7.10,7.9, 7.8, 7.7, 7.4, 7.1
- 6.8, 6.7, 6.5, 6.4, 6.3, 6.2, 6.0
- 5.6, 5.5, 5.3, 5.1
- 2.3
- 1.5
As a part of a new OpenSearch project, it is recommended to select the latest version of OpenSearch, which is supported. With an existing domain using an old version of Elasticsearch, one can choose between keeping the domain or migrating data.
The process of using Amazon OpenSearch is quite simple. Just follow the below-mentioned steps:
Step 1: Sign up for an AWS account.
Step 2: Set up a domain.
Step 3: Use domain access policy or fine-grained access control to set control access to the domain.
Step 4: Data can be indexed either from AWS services or it can be done manually.
Step 5: Search for data using the OpenSearch dashboard. This also allows the creation of visualizations.
OpenSearch Service – Pricing
With the countless benefits of Amazon OpenSearch Service, surely you must be thinking about its pricing.
Amazon OpenSearch Services is charged based on the number of hours of usage of the EC2 instance. One also needs to pay charges for the total size of EBS storage volumes that are associated with the instances.
Apart from this, there are some differences w.r.t data transfer. With multiple availability zones used by a domain, there is no additional billing by OpenSearch Services for traffic between these availability zones. Similarly, for any transfer of data between Ultra warm/cold nodes and Amazon S3, there is no billing done by OpenSearch Service.
It’s time to look at some success stories.
Success Stories
Here are some leading examples of organizations that have successfully used Elasticsearch.
- eBay
[image source]
eBay has successfully created the ‘Elasticsearch as a Service’ platform, which allows a simple Elasticsearch cluster that provisions on the company’s internal OpenStack-based cloud platform.
- Netflix
[image source]
Netflix largely depends on the ELK stack for monitoring and analyzing customer operations and logs related to security. The organization also uses Elasticsearch to create shards, replicas, and an entire ecosystem with multiple plugins.
The use of Elasticsearch by Netflix has seen a rapid surge in recent times, starting with a few small deployments to more than a dozen clusters that carry hundreds of nodes.
FAQs About Elasticsearch
Is Elasticsearch a database?
Yes, it is a document-based database. With document-based databases such as Elasticsearch, the design of mapping and storage of documents in an optimized manner for simplified search and retrieval.
What are some popular uses of Elasticsearch?
Elasticsearch is popularly used for storing, searching, and analyzing voluminous data quickly and in near real-time. It serves as a technological powerhouse for applications with complex needs.
What port does Elasticsearch use?
Elasticsearch uses port 9200 in case of requests and port 9300 to establish communication between the nodes in a cluster. This is the default setting.
Is Elasticsearch free for commercial use?
Yes, it is an open-source and free tool and one does not need to pay anything to use it for production. The company also has products that enhance support and features that can be availed when paid for.
Conclusion
In this article, we have discussed what Elasticsearch engine is, its capabilities, and some of its use cases with leading organizations.
Further Reading =>> Most Popular ElasticSearch Interview Questions
Elasticsearch can solve numerous problems for enterprises and has become a popular choice for many companies. The article also elucidates Elastic search AWS.
We hope that this article will be a good read for beginners to give them a quick start.