Monday, March 2, 2015

Brief Introduction to Elasticsearch


Maybe you've probably heard of Elasticsearch. Maybe you are even using Elasticsearch right now. Or maybe you just haven't. For those who've never had any contact with this "relatively new" technology (even though the first version of Elasticsearch dates back to 2010, its precursor Compass dates back to 2004!), here is a brief introduction to it.

Brief introduction

Elasticsearch is a search server based on Lucene, which was originally created by Shay Banon. It provides a distributed, multitenant-capable full-text search engine with a RESTFUL web interface and schema-free JSON documents. It allows you to store, search, and analyze big volumes of data quickly and in near real time [1]. Elasticsearch, according to their official website, is mostly used as an underlying engine/technology to empower applications that require complex search features [1]. This Java-based search server is open sourced under the Apache License 2.0 and it literally requires zero configuration to set up.

At the moment, Elasticsearch is one of the most popular enterprise search engines available. Its popularity has been steadily increasing over time and many notable companies and organisations are using Elasticsearch.

Terminology

Elasticsearch uses a special terminology, which may require some getting used to if you've never used "NoSQL" databases or come from a heavy relational database background. The following table pretty much summarizes some of the most commonly used terminology:


MySQL
Elasticsearch
Database
Index
Table
Type
Row
Document
Column
Field
Schema
Mapping
Index
In Elasticsearch everything is indexed (unless you choose not to)

Having said that, let's take a look at how Elasticsearch works.

Distributed and highly available

In Elasticsearch multiple servers (nodes) run in a cluster. These nodes act as a single service. There are two types of nodes:

  • Nodes that store data
  • Nodes that help with speeding up queries
In order to replicate data, Elasticsearch uses the concept of sharding. All indexes are sharded and each shard can have zero or more replicas. The number of shards is also configurable, you can set up from the start how many shards you desire to have. These shards are usually distributed on different servers or server pools for failover reasons. So if one cluster goes down, data has already been replicated and can be restored easily.

Running Elasticsearch

Running Elasticsearch is very easy. Since Elasticsearch is a standalone Java application, installing and running it's as easy as downloading from the official site, unpacking and running:
$ bin/elasticsearch
 from its unpacked directory. After that you should then see something similar to this:

 
which indicates that Elasticsearch is running in its standard port 9300. Once the server is up and running (takes about 30 seconds depending on your machine speed), you can start creating indexes and start filling Elasticsearch with types, documents, and fields (see what I did there? I used the terminology defined above).
Since there are plenty of many - great - tutorials I will not get into details on how to use Elasticsearch. This posts intention was merely as a brief introduction.

Sources:
[1] http://www.elasticsearch.org/

0 comments:

Post a Comment