Elasticsearch: Full-Text Search and Index Management

Elasticsearch is a distributed search and analytics engine built on top of Apache Lucene. You interact with it over a REST API using JSON, and it stores all data as JSON documents inside named indexes. Unlike MySQL, which finds rows by scanning or following B+ tree indexes, Elasticsearch builds an inverted index at write time so that every term in every text field points directly to the documents that contain it. This makes full-text search fast at any scale, but it also means Elasticsearch is optimized for search-read workloads rather than transactional writes or strict relational joins.

Core concepts

Index

An index is a named collection of documents that share a similar structure — analogous to a table in MySQL. Index names must be lowercase. You can have one index per entity type (e.g., products, orders) or combine related entities into a single index with distinct field sets.

PUT /products
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

Index health states:

green — all primary and replica shards are allocated and active.
yellow — all primary shards are active, but at least one replica shard is unallocated. Reads and writes work; the cluster has no redundancy for affected shards.
red — at least one primary shard is unallocated. Some data is unavailable.

Document

A document is a single JSON object stored inside an index. It is the smallest unit Elasticsearch indexes and returns. Every document has a _id field (auto-generated or user-specified) and an _index field indicating which index it belongs to.

{
  "_index": "products",
  "_id": "1",
  "_source": {
    "title": "Wireless Headphones",
    "price": 89.99,
    "created_at": "2024-03-01",
    "description": "Over-ear noise-cancelling headphones with 30-hour battery."
  }
}

Shard

Elasticsearch horizontally partitions each index into shards. Each shard is an independent Lucene index that can be hosted on any node in the cluster. Sharding lets you store more data than fits on a single machine and parallelize search queries across multiple nodes. You set number_of_shards at index creation time and cannot change it afterward without reindexing. Choose a shard count that fits your expected data volume and leaves room to grow.

Replica

A replica is an exact copy of a primary shard hosted on a different node. Replicas serve two purposes:

Fault tolerance — if the node holding a primary shard fails, a replica is promoted to primary automatically.
Read throughput — search queries can be routed to any replica, distributing read load.

You can change number_of_replicas on a live index without reindexing.

Mapping and field types

Mapping defines how Elasticsearch stores and indexes each field — the equivalent of a table schema. Elasticsearch can infer mapping from the first document you index (dynamic mapping), but for production use you should define explicit mappings to control field types and prevent unintended behavior.

PUT /products
{
  "settings": { "number_of_replicas": 0, "number_of_shards": 1 },
  "mappings": {
    "properties": {
      "id":          { "type": "integer" },
      "title":       { "type": "keyword" },
      "price":       { "type": "double" },
      "created_at":  { "type": "date" },
      "description": { "type": "text" }
    }
  }
}

Once a mapping is created, you cannot modify or delete field types. To change a field’s type, you must delete the index, create it with the new mapping, and reindex your data.

Key field types

Type	Behavior	Use when
`keyword`	Not analyzed; exact-match only	IDs, status codes, tags, enum values
`text`	Analyzed by the configured tokenizer; supports full-text search	Product names, descriptions, body text
`integer` / `long`	Numeric integer	Counts, IDs, ages
`float` / `double`	Floating-point	Prices, scores, coordinates
`date`	ISO 8601 string or epoch milliseconds	Timestamps
`boolean`	`true` / `false`	Flags

The critical distinction is between keyword and text:

keyword fields store the raw string and support only equality and prefix queries.
text fields are tokenized — split into individual terms by an analyzer — and support full-text queries. The trade-off is that text fields cannot be sorted or aggregated efficiently.

Query DSL

Elasticsearch’s Query DSL lets you express searches as JSON objects sent in the request body of a GET /_search request. Every query returns a hits array with matching documents and a _score representing relevance.

match_all

Returns every document in the index.

GET /products/_search
{
  "query": { "match_all": {} }
}

term

Exact-match query for keyword, numeric, date, or boolean fields. Does not analyze the query value.

GET /products/_search
{
  "query": {
    "term": { "id": { "value": 1 } }
  }
}

match

Full-text query for text fields. Analyzes the query string using the same analyzer as the field.

GET /products/_search
{
  "query": {
    "match": { "description": "noise cancelling headphones" }
  }
}

range

Returns documents where a field value falls within a specified range.

GET /products/_search
{
  "query": {
    "range": {
      "price": { "gte": 20, "lte": 100 }
    }
  }
}

bool

Combines multiple queries with boolean logic. Use must (AND), should (OR), and must_not (NOT).

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "description": "headphones" } },
        { "range":  { "price": { "lte": 150 } } }
      ],
      "must_not": [
        { "term": { "title": { "value": "out of stock" } } }
      ]
    }
  }
}

multi_match

Runs the same query string against multiple fields simultaneously.

GET /products/_search
{
  "query": {
    "multi_match": {
      "query":  "wireless headphones",
      "fields": ["title", "description"]
    }
  }
}

Highlighting

You can ask Elasticsearch to return the matched fragment with the matching terms wrapped in HTML tags.

GET /products/_search
{
  "query": { "match": { "description": "noise cancelling" } },
  "highlight": {
    "pre_tags":  ["<mark>"],
    "post_tags": ["</mark>"],
    "fields":    { "description": {} }
  }
}

How the inverted index works

Elasticsearch builds an inverted index for every text field. A normal (forward) index maps documents to words; an inverted index maps words to documents. This is what makes full-text search fast. Build time (indexing):

The analyzer splits the field value into terms (tokenization, lowercasing, stop-word removal, stemming depending on configuration).
For each term, Elasticsearch records the document ID, the position of the term within the document, and frequency of occurrence.
The resulting mapping from term → document list is stored in the Lucene segment files on disk.

Query time:

The query string is analyzed using the same analyzer.
Elasticsearch looks up each resulting term in the inverted index to get a list of document IDs.
For multi-term queries, Elasticsearch intersects (AND) or unions (OR) the document lists.
Documents are scored using TF-IDF or BM25 and sorted by score.

Because the term-to-document mapping is precomputed at index time, search does not scan documents — it performs a direct lookup.

Index management

Create an index with settings

PUT /products
{
  "settings": {
    "number_of_shards":   3,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "title":       { "type": "keyword" },
      "description": { "type": "text" },
      "price":       { "type": "double" },
      "created_at":  { "type": "date" }
    }
  }
}

Index a document

POST /products/_doc/1
{
  "title":       "Wireless Headphones",
  "description": "Over-ear noise-cancelling headphones with 30-hour battery.",
  "price":       89.99,
  "created_at":  "2024-03-01"
}

Update a document (partial)

POST /products/_update/1
{
  "doc": { "price": 79.99 }
}

Delete a document

DELETE /products/_doc/1

Bulk operations

The _bulk API processes multiple create, update, and delete operations in a single request. Operations in a bulk request are not atomic — individual operations can fail without rolling back the others.

POST _bulk
{"index": {"_index": "products", "_id": 2}}
{"title": "Bluetooth Speaker", "price": 49.99, "created_at": "2024-04-01", "description": "Portable waterproof speaker"}
{"update": {"_index": "products", "_id": 1}}
{"doc": {"price": 75.00}}
{"delete": {"_index": "products", "_id": 3}}

Check index health

GET /_cat/indices?v

When to use Elasticsearch vs. MySQL vs. Redis

Requirement	Best fit	Reason
Full-text search with relevance scoring	Elasticsearch	Inverted index with BM25 scoring
Transactional writes with ACID guarantees	MySQL	MVCC, two-phase commit, foreign keys
Simple key lookups at sub-millisecond latency	Redis	In-memory, O(1) hash lookup
Range queries on numeric or date fields	MySQL or Elasticsearch	B+ tree (MySQL) or range filter (ES)
Aggregations over large document sets	Elasticsearch	Distributed aggregation framework
Relational joins across normalized tables	MySQL	JOIN optimizer, foreign key constraints
Session storage, counters, leaderboards	Redis	Purpose-built data structures
Log analytics and time-series search	Elasticsearch	Scalable inverted index + date histograms

Elasticsearch is eventually consistent by design. After you index a document, it becomes searchable only after the next refresh (default every 1 second). Do not use Elasticsearch as your primary database for transactional data that must be immediately consistent.

Get Started

Databases

Backend Engineering

Computer Science

Software Design

Elasticsearch: Full-Text Search and Index Management

Core concepts

Index

Document

Shard

Replica

Mapping and field types

Key field types

Query DSL

match_all

term

match

range

bool

multi_match

Highlighting

How the inverted index works

Index management

Create an index with settings

Index a document

Update a document (partial)

Delete a document

Bulk operations

Check index health

When to use Elasticsearch vs. MySQL vs. Redis

Get Started

Databases

Backend Engineering

Computer Science

Software Design

​Core concepts

​Index

​Document

​Shard

​Replica

​Mapping and field types

​Key field types

​Query DSL

​match_all

​term

​match

​range

​bool

​multi_match

​Highlighting

​How the inverted index works

​Index management

​Create an index with settings

​Index a document

​Update a document (partial)

​Delete a document

​Bulk operations

​Check index health

​When to use Elasticsearch vs. MySQL vs. Redis

Core concepts

Index

Document

Shard

Replica

Mapping and field types

Key field types

Query DSL

match_all

term

match

range

bool

multi_match

Highlighting

How the inverted index works

Index management

Create an index with settings

Index a document

Update a document (partial)

Delete a document

Bulk operations

Check index health

When to use Elasticsearch vs. MySQL vs. Redis