Elasticsearch is a distributed search and analytics engine built on top of Apache Lucene. You interact with it over a REST API using JSON, and it stores all data as JSON documents inside named indexes. Unlike MySQL, which finds rows by scanning or following B+ tree indexes, Elasticsearch builds an inverted index at write time so that every term in every text field points directly to the documents that contain it. This makes full-text search fast at any scale, but it also means Elasticsearch is optimized for search-read workloads rather than transactional writes or strict relational joins.
Core concepts
Index
An index is a named collection of documents that share a similar structure — analogous to a table in MySQL. Index names must be lowercase. You can have one index per entity type (e.g., products, orders) or combine related entities into a single index with distinct field sets.
PUT /products
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
}
}
Index health states:
- green — all primary and replica shards are allocated and active.
- yellow — all primary shards are active, but at least one replica shard is unallocated. Reads and writes work; the cluster has no redundancy for affected shards.
- red — at least one primary shard is unallocated. Some data is unavailable.
Document
A document is a single JSON object stored inside an index. It is the smallest unit Elasticsearch indexes and returns. Every document has a _id field (auto-generated or user-specified) and an _index field indicating which index it belongs to.
{
"_index": "products",
"_id": "1",
"_source": {
"title": "Wireless Headphones",
"price": 89.99,
"created_at": "2024-03-01",
"description": "Over-ear noise-cancelling headphones with 30-hour battery."
}
}
Shard
Elasticsearch horizontally partitions each index into shards. Each shard is an independent Lucene index that can be hosted on any node in the cluster. Sharding lets you store more data than fits on a single machine and parallelize search queries across multiple nodes.
You set number_of_shards at index creation time and cannot change it afterward without reindexing. Choose a shard count that fits your expected data volume and leaves room to grow.
Replica
A replica is an exact copy of a primary shard hosted on a different node. Replicas serve two purposes:
- Fault tolerance — if the node holding a primary shard fails, a replica is promoted to primary automatically.
- Read throughput — search queries can be routed to any replica, distributing read load.
You can change number_of_replicas on a live index without reindexing.
Mapping and field types
Mapping defines how Elasticsearch stores and indexes each field — the equivalent of a table schema. Elasticsearch can infer mapping from the first document you index (dynamic mapping), but for production use you should define explicit mappings to control field types and prevent unintended behavior.
PUT /products
{
"settings": { "number_of_replicas": 0, "number_of_shards": 1 },
"mappings": {
"properties": {
"id": { "type": "integer" },
"title": { "type": "keyword" },
"price": { "type": "double" },
"created_at": { "type": "date" },
"description": { "type": "text" }
}
}
}
Once a mapping is created, you cannot modify or delete field types. To change a field’s type, you must delete the index, create it with the new mapping, and reindex your data.
Key field types
| Type | Behavior | Use when |
|---|
keyword | Not analyzed; exact-match only | IDs, status codes, tags, enum values |
text | Analyzed by the configured tokenizer; supports full-text search | Product names, descriptions, body text |
integer / long | Numeric integer | Counts, IDs, ages |
float / double | Floating-point | Prices, scores, coordinates |
date | ISO 8601 string or epoch milliseconds | Timestamps |
boolean | true / false | Flags |
The critical distinction is between keyword and text:
keyword fields store the raw string and support only equality and prefix queries.
text fields are tokenized — split into individual terms by an analyzer — and support full-text queries. The trade-off is that text fields cannot be sorted or aggregated efficiently.
Query DSL
Elasticsearch’s Query DSL lets you express searches as JSON objects sent in the request body of a GET /_search request. Every query returns a hits array with matching documents and a _score representing relevance.
match_all
Returns every document in the index.
GET /products/_search
{
"query": { "match_all": {} }
}
term
Exact-match query for keyword, numeric, date, or boolean fields. Does not analyze the query value.
GET /products/_search
{
"query": {
"term": { "id": { "value": 1 } }
}
}
match
Full-text query for text fields. Analyzes the query string using the same analyzer as the field.
GET /products/_search
{
"query": {
"match": { "description": "noise cancelling headphones" }
}
}
range
Returns documents where a field value falls within a specified range.
GET /products/_search
{
"query": {
"range": {
"price": { "gte": 20, "lte": 100 }
}
}
}
bool
Combines multiple queries with boolean logic. Use must (AND), should (OR), and must_not (NOT).
GET /products/_search
{
"query": {
"bool": {
"must": [
{ "match": { "description": "headphones" } },
{ "range": { "price": { "lte": 150 } } }
],
"must_not": [
{ "term": { "title": { "value": "out of stock" } } }
]
}
}
}
multi_match
Runs the same query string against multiple fields simultaneously.
GET /products/_search
{
"query": {
"multi_match": {
"query": "wireless headphones",
"fields": ["title", "description"]
}
}
}
Highlighting
You can ask Elasticsearch to return the matched fragment with the matching terms wrapped in HTML tags.
GET /products/_search
{
"query": { "match": { "description": "noise cancelling" } },
"highlight": {
"pre_tags": ["<mark>"],
"post_tags": ["</mark>"],
"fields": { "description": {} }
}
}
How the inverted index works
Elasticsearch builds an inverted index for every text field. A normal (forward) index maps documents to words; an inverted index maps words to documents. This is what makes full-text search fast.
Build time (indexing):
- The analyzer splits the field value into terms (tokenization, lowercasing, stop-word removal, stemming depending on configuration).
- For each term, Elasticsearch records the document ID, the position of the term within the document, and frequency of occurrence.
- The resulting mapping from term → document list is stored in the Lucene segment files on disk.
Query time:
- The query string is analyzed using the same analyzer.
- Elasticsearch looks up each resulting term in the inverted index to get a list of document IDs.
- For multi-term queries, Elasticsearch intersects (AND) or unions (OR) the document lists.
- Documents are scored using TF-IDF or BM25 and sorted by score.
Because the term-to-document mapping is precomputed at index time, search does not scan documents — it performs a direct lookup.
Index management
Create an index with settings
PUT /products
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"title": { "type": "keyword" },
"description": { "type": "text" },
"price": { "type": "double" },
"created_at": { "type": "date" }
}
}
}
Index a document
POST /products/_doc/1
{
"title": "Wireless Headphones",
"description": "Over-ear noise-cancelling headphones with 30-hour battery.",
"price": 89.99,
"created_at": "2024-03-01"
}
Update a document (partial)
POST /products/_update/1
{
"doc": { "price": 79.99 }
}
Delete a document
Bulk operations
The _bulk API processes multiple create, update, and delete operations in a single request. Operations in a bulk request are not atomic — individual operations can fail without rolling back the others.
POST _bulk
{"index": {"_index": "products", "_id": 2}}
{"title": "Bluetooth Speaker", "price": 49.99, "created_at": "2024-04-01", "description": "Portable waterproof speaker"}
{"update": {"_index": "products", "_id": 1}}
{"doc": {"price": 75.00}}
{"delete": {"_index": "products", "_id": 3}}
Check index health
When to use Elasticsearch vs. MySQL vs. Redis
| Requirement | Best fit | Reason |
|---|
| Full-text search with relevance scoring | Elasticsearch | Inverted index with BM25 scoring |
| Transactional writes with ACID guarantees | MySQL | MVCC, two-phase commit, foreign keys |
| Simple key lookups at sub-millisecond latency | Redis | In-memory, O(1) hash lookup |
| Range queries on numeric or date fields | MySQL or Elasticsearch | B+ tree (MySQL) or range filter (ES) |
| Aggregations over large document sets | Elasticsearch | Distributed aggregation framework |
| Relational joins across normalized tables | MySQL | JOIN optimizer, foreign key constraints |
| Session storage, counters, leaderboards | Redis | Purpose-built data structures |
| Log analytics and time-series search | Elasticsearch | Scalable inverted index + date histograms |
Elasticsearch is eventually consistent by design. After you index a document, it becomes searchable only after the next refresh (default every 1 second). Do not use Elasticsearch as your primary database for transactional data that must be immediately consistent.