How to Search Data in Elasticsearch
How to Search Data in Elasticsearch Elasticsearch is a powerful, distributed search and analytics engine built on Apache Lucene. It enables near real-time searching across vast datasets with high scalability and performance. Whether you're analyzing log files, powering e-commerce product discovery, or enabling full-text search in enterprise applications, Elasticsearch’s flexible query DSL and rich
How to Search Data in Elasticsearch
Elasticsearch is a powerful, distributed search and analytics engine built on Apache Lucene. It enables near real-time searching across vast datasets with high scalability and performance. Whether you're analyzing log files, powering e-commerce product discovery, or enabling full-text search in enterprise applications, Elasticsearchs flexible query DSL and rich filtering capabilities make it indispensable. Learning how to search data in Elasticsearch is not just a technical skillits a strategic advantage for developers, data engineers, and analysts working with large-scale, unstructured, or semi-structured data.
Unlike traditional relational databases that rely on structured SQL queries, Elasticsearch uses a JSON-based query language that supports complex searches including fuzzy matching, aggregations, geospatial queries, and term boosting. This tutorial provides a comprehensive, step-by-step guide to mastering data search in Elasticsearchfrom basic term queries to advanced multi-field searches and performance optimization. By the end, youll understand how to construct efficient queries, interpret results, and apply best practices that ensure speed, accuracy, and scalability in production environments.
Step-by-Step Guide
Setting Up Elasticsearch
Before you can search data, you need a running Elasticsearch instance. The easiest way to get started is by using Docker. Run the following command to start Elasticsearch 8.x:
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:8.12.0
Once running, verify the cluster status by accessing http://localhost:9200 in your browser or via curl:
curl -X GET "localhost:9200"
You should receive a JSON response containing cluster name, version, and node information. If youre using a cloud-hosted instance like Elastic Cloud, use the provided endpoint and authentication credentials instead.
Creating an Index with Mapping
Elasticsearch stores data in indices, which are similar to tables in relational databases. However, unlike SQL tables, Elasticsearch indices are schema-flexible by default. For production use, its recommended to define a mapping explicitly to control data types and optimize search behavior.
Lets create an index named products with a structured mapping:
PUT /products
{
"mappings": {
"properties": {
"name": { "type": "text", "analyzer": "standard" },
"description": { "type": "text", "analyzer": "english" },
"price": { "type": "float" },
"category": { "type": "keyword" },
"in_stock": { "type": "boolean" },
"created_at": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss" },
"tags": { "type": "keyword" }
}
}
}
Here, text fields are analyzed (tokenized and lowercased) for full-text search, while keyword fields are not analyzedideal for exact matches, filters, and aggregations. The date type ensures proper time-based queries, and boolean enables efficient true/false filtering.
Indexing Sample Data
Now, insert sample documents into the products index:
POST /products/_bulk
{"index":{"_id":"1"}}
{"name":"Wireless Headphones","description":"Noise-cancelling over-ear headphones with 30-hour battery life","price":199.99,"category":"Electronics","in_stock":true,"created_at":"2024-01-15 10:30:00","tags":["audio","wireless","premium"]}
{"index":{"_id":"2"}}
{"name":"Coffee Mug","description":"Ceramic mug with hand-painted design, microwave safe","price":12.50,"category":"Home & Kitchen","in_stock":true,"created_at":"2024-01-14 09:15:00","tags":["ceramic","gift","kitchen"]}
{"index":{"_id":"3"}}
{"name":"Running Shoes","description":"Lightweight breathable shoes for marathon training","price":119.99,"category":"Sports","in_stock":false,"created_at":"2024-01-12 14:22:00","tags":["athletic","running","comfort"]}
{"index":{"_id":"4"}}
{"name":"Smart Thermostat","description":"Wi-Fi enabled thermostat with learning algorithms","price":249.99,"category":"Electronics","in_stock":true,"created_at":"2024-01-10 16:45:00","tags":["smart","home","energy"]}
{"index":{"_id":"5"}}
{"name":"Yoga Mat","description":"Non-slip eco-friendly mat, 6mm thickness","price":34.99,"category":"Sports","in_stock":true,"created_at":"2024-01-16 08:10:00","tags":["yoga","fitness","eco"]}
Using _bulk is more efficient than individual POST requests when inserting multiple documents. Each document is now indexed and ready for search.
Basic Term Search
The simplest search in Elasticsearch is a term-level query. Use the match query for full-text searches on analyzed fields like name or description:
GET /products/_search
{
"query": {
"match": {
"name": "headphones"
}
}
}
This returns all documents where the name field contains the word headphones. Elasticsearch tokenizes the search term and matches against indexed tokens. The result includes relevance scores (_score) based on TF-IDF (Term Frequency-Inverse Document Frequency).
For exact matches on keyword fields, use the term query:
GET /products/_search
{
"query": {
"term": {
"category.keyword": "Electronics"
}
}
}
Note the use of .keywordthis accesses the raw, unanalyzed version of the field. Without it, the query would fail because category is a text field and cannot be used for exact matching.
Multi-Field Search with Boolean Queries
Real-world searches often require combining multiple conditions. Use the bool query to combine must, should, and must_not clauses.
Example: Find all in-stock electronics with wireless in the description:
GET /products/_search
{
"query": {
"bool": {
"must": [
{ "term": { "category.keyword": "Electronics" } },
{ "match": { "description": "wireless" } }
],
"filter": [
{ "term": { "in_stock": true } }
]
}
}
}
Here, must clauses affect scoring, while filter clauses are used for exact matches and are cached for performance. Filters are faster because they dont compute relevance scores.
Phrase and Proximity Searches
To search for exact phrases, use match_phrase:
GET /products/_search
{
"query": {
"match_phrase": {
"description": "noise-cancelling headphones"
}
}
}
This returns only documents where noise-cancelling headphones appears as a contiguous phrase, not as separate terms.
For proximity searches (terms within N words of each other), use match_phrase_prefix or span_near:
GET /products/_search
{
"query": {
"match_phrase": {
"description": {
"query": "wireless headphones",
"slop": 2
}
}
}
}
The slop parameter allows up to two words to be inserted between wireless and headphones, increasing flexibility while preserving phrase intent.
Range Queries
Elasticsearch supports numeric and date ranges. Search for products priced between $50 and $200:
GET /products/_search
{
"query": {
"range": {
"price": {
"gte": 50,
"lte": 200
}
}
}
}
For date ranges, use ISO format:
GET /products/_search
{
"query": {
"range": {
"created_at": {
"gte": "2024-01-12T00:00:00",
"lte": "2024-01-16T23:59:59"
}
}
}
}
Fuzzy Searches for Typo Tolerance
Users often make typos. Elasticsearchs fuzzy queries handle this gracefully:
GET /products/_search
{
"query": {
"fuzzy": {
"name": {
"value": "headphon",
"fuzziness": "AUTO"
}
}
}
}
This matches headphones even with a missing e. The fuzziness parameter can be set to 0, 1, 2, or AUTO (recommended). Use this sparingly in high-volume environments, as fuzzy queries are computationally expensive.
Wildcard and Regex Searches
For pattern matching, use wildcard or regex queries:
GET /products/_search
{
"query": {
"wildcard": {
"name.keyword": "*head*"
}
}
}
Or use regex for more complex patterns:
GET /products/_search
{
"query": {
"regexp": {
"name.keyword": ".*Shoes.*"
}
}
}
Warning: Wildcard and regex queries can be slow on large datasets. Always use them on keyword fields and avoid leading wildcards (e.g., *head) when possible.
Sorting and Pagination
Sort results by any field:
GET /products/_search
{
"query": {
"match_all": {}
},
"sort": [
{ "price": { "order": "asc" } },
{ "_score": { "order": "desc" } }
]
}
For pagination, use from and size:
GET /products/_search
{
"query": {
"match_all": {}
},
"from": 10,
"size": 10
}
This returns results 1120. For deep pagination (>10,000 results), use search_after with a sort value for better performance:
GET /products/_search
{
"query": {
"match_all": {}
},
"sort": [
{ "price": "asc" },
{ "_id": "asc" }
],
"search_after": [119.99, "3"],
"size": 10
}
Highlighting Search Terms
Highlighting helps users see why a document matched. Enable it in your query:
GET /products/_search
{
"query": {
"match": {
"description": "wireless"
}
},
"highlight": {
"fields": {
"description": {}
}
}
}
The response includes a highlight section with matched snippets wrapped in <em> tags by default. Customize the tags with pre_tags and post_tags if needed.
Best Practices
Use Keyword Fields for Filtering and Aggregations
Always use keyword fields for exact matches, filters, sorting, and aggregations. Using text fields for these purposes leads to inaccurate results because they are tokenized. For example, filtering on category: "Electronics" using a text field may match electronic or electronics due to stemming, which is often undesirable.
Prefer Filters Over Queries for Static Conditions
Use the filter context inside a bool query for conditions that dont require scoring (e.g., status, category, date ranges). Filters are cached and execute faster than queries. Only use must or should when relevance scoring matters.
Limit Result Size and Use Scroll or Search After for Large Datasets
Avoid using from and size beyond 10,000 results. For exporting or processing large volumes of data, use the scroll API for batch exports or search_after for real-time pagination. Scroll is ideal for one-time exports; search_after is better for UI pagination.
Optimize Index Settings for Search Performance
Adjust index settings like number of shards and replicas based on your data size and query load. For read-heavy applications, increase replicas (e.g., number_of_replicas: 2). Avoid too many shardseach shard consumes memory and CPU. A good rule of thumb: keep shard size between 1050 GB.
Use Index Templates for Consistent Mappings
Define index templates to enforce consistent mappings across time-based or patterned indices (e.g., logs, metrics). This prevents mapping conflicts and ensures search behavior remains predictable.
Monitor Query Performance with the Profile API
To debug slow queries, use the profile parameter:
GET /products/_search
{
"profile": true,
"query": {
"match": {
"name": "headphones"
}
}
}
The response includes detailed timing for each phase of the query execution, helping you identify bottlenecks like expensive filters or poorly designed analyzers.
Avoid Wildcards and Regex on Large Text Fields
Wildcard and regex queries can cause high CPU usage and slow response times. If you need pattern matching, consider using n-grams during indexing instead. For example, index headphones as he, ea, ad, dp, etc., then search on those tokens.
Use Caching Strategically
Elasticsearch caches filters, queries, and field data. Enable fielddata only on keyword fields used for aggregations. Avoid enabling it on large text fieldsit can consume significant heap memory. Use doc values (enabled by default for keyword and numeric fields) for sorting and aggregations.
Regularly Optimize Indices with Force Merge
After bulk indexing or deletions, run a force merge to reduce segment count:
POST /products/_forcemerge?max_num_segments=1
This improves search speed by reducing the number of segments Elasticsearch must scan. Do this during off-peak hours, as its I/O intensive.
Test Queries with the Explain API
To understand why a document scored a certain way, use the explain parameter:
GET /products/_search
{
"explain": true,
"query": {
"match": {
"name": "headphones"
}
}
}
This returns a breakdown of how the score was calculateduseful for tuning relevance and debugging unexpected results.
Tools and Resources
Elasticsearch Kibana
Kibana is the official visualization and management interface for Elasticsearch. It provides a Query Console for testing queries, a Dashboard for visualizing results, and a Machine Learning UI for anomaly detection. Use Kibanas Dev Tools to write, save, and share queries with your team.
Elasticsearch SQL Interface
For teams more comfortable with SQL, Elasticsearch offers a SQL interface. Enable it and use:
POST /_sql?format=csv
{
"query": "SELECT name, price FROM products WHERE category = 'Electronics' AND in_stock = true"
}
While convenient, SQL queries are less powerful than the native query DSL and may not support all advanced features like nested objects or complex scripting.
Postman and curl
For API testing and automation, use Postman or command-line curl. Save common queries as Postman collections for reuse. Use shell scripts to automate index creation, data loading, and health checks.
Elasticsearch Client Libraries
Use official client libraries for integration with your application:
- Python:
elasticsearch-py - Java:
Java High Level REST Client(deprecated) orJava API Client - Node.js:
@elastic/elasticsearch - .NET:
Elastic.Clients.Elasticsearch
These libraries handle connection pooling, retries, and serialization automatically.
OpenSearch and Alternative Tools
OpenSearch is a fork of Elasticsearch 7.10.2, maintained by AWS. Its fully compatible with Elasticsearch queries and is a viable open-source alternative. Other tools like Apache Solr also provide search capabilities but lack Elasticsearchs real-time indexing and distributed architecture.
Documentation and Community
Always refer to the official Elasticsearch documentation at elastic.co/guide. The community forums and GitHub issues are valuable resources for troubleshooting edge cases. Stack Overflow tags like elasticsearch and elasticsearch-query contain thousands of solved problems.
Monitoring and Alerting
Use Elasticsearchs built-in monitoring features or integrate with Prometheus and Grafana to track cluster health, query latency, and memory usage. Set alerts for high CPU, low disk space, or slow search times to proactively maintain performance.
Real Examples
Example 1: E-Commerce Product Search
Scenario: A user searches for red running shoes under $100 on an e-commerce site.
Query:
GET /products/_search
{
"query": {
"bool": {
"must": [
{ "match": { "name": "running shoes" } },
{ "match": { "description": "red" } }
],
"filter": [
{ "range": { "price": { "lt": 100 } } },
{ "term": { "in_stock": true } }
]
}
},
"sort": [
{ "price": "asc" }
],
"highlight": {
"fields": {
"name": {},
"description": {}
}
},
"size": 10
}
Results are sorted by price, highlight matched terms, and return only in-stock items under $100. This query balances relevance and filtering for an optimal user experience.
Example 2: Log Analysis with Time-Based Filtering
Scenario: A DevOps team needs to find all ERROR logs from the API service in the last 24 hours.
Assume logs are indexed daily: logs-2024-01-15
GET /logs-*/_search
{
"query": {
"bool": {
"must": [
{ "match": { "level": "ERROR" } },
{ "match": { "service": "api" } }
],
"filter": [
{
"range": {
"@timestamp": {
"gte": "now-24h/d",
"lt": "now/d"
}
}
}
]
}
},
"sort": [
{ "@timestamp": "desc" }
],
"size": 50
}
The index pattern logs-* searches across all daily indices. The now-24h/d syntax dynamically calculates the last 24 hours, making the query reusable across days.
Example 3: User Behavior Analytics with Aggregations
Scenario: A marketing team wants to see how many users clicked each product category in the last week.
GET /user_clicks/_search
{
"size": 0,
"query": {
"range": {
"click_time": {
"gte": "now-7d/d"
}
}
},
"aggs": {
"category_clicks": {
"terms": {
"field": "product_category.keyword",
"size": 10
}
}
}
}
This returns a top-10 list of categories by click count. Setting size: 0 suppresses hits since only aggregations are needed. This is a common pattern for dashboards and analytics reports.
Example 4: Fuzzy Search for Misspelled Product Names
Scenario: A user types sneakers but the product is indexed as sneaker.
GET /products/_search
{
"query": {
"fuzzy": {
"name": {
"value": "sneakers",
"fuzziness": "AUTO",
"prefix_length": 2
}
}
}
}
By setting prefix_length: 2, Elasticsearch requires the first two characters to match exactly, reducing false positives. This balances recall and precision for user-facing search boxes.
FAQs
What is the difference between match and term queries?
match queries analyze the search term and match against analyzed text fields (e.g., Running Shoes becomes running and shoes). term queries match exact values and are used on keyword fields (e.g., Electronics must match exactly).
Why is my search returning no results?
Common causes: (1) Using term on a text field instead of .keyword, (2) Mismatched field names, (3) Index not refreshed after indexing (wait 1s or call _refresh), (4) Wrong index name, (5) Document not indexed due to mapping conflicts.
How do I search across multiple indices?
Use index patterns like /logs-2024*,logs-2023* or /logs-* in your search URL. You can also use aliases to group indices logically.
Can I search nested objects?
Yes. Use the nested query type for objects with their own documents. For example, if a product has a reviews array with rating and comment fields, use:
{
"query": {
"nested": {
"path": "reviews",
"query": {
"bool": {
"must": [
{ "match": { "reviews.comment": "excellent" } },
{ "range": { "reviews.rating": { "gte": 4 } } }
]
}
}
}
}
}
How do I improve search speed?
Use filters instead of queries, reduce the number of shards, use doc values, avoid wildcards, increase replicas for read-heavy workloads, and use the profile API to identify bottlenecks.
Does Elasticsearch support autocomplete?
Yes. Use completion suggesters for fast prefix matching. Index suggestions during document creation and query them with the suggest API. Alternatively, use n-gram analyzers on text fields for more flexible autocomplete.
Whats the maximum number of results Elasticsearch can return?
By default, Elasticsearch limits results to 10,000 for performance reasons. To retrieve more, use search_after or scroll. Never increase index.max_result_window beyond 100,000it degrades performance.
How do I handle case-insensitive searches?
Elasticsearchs default analyzers (like standard) lowercase text automatically. For case-sensitive matching, use keyword fields with a keyword analyzer and apply a lowercase filter during indexing.
Can I search in real time?
Yes. Elasticsearch has near real-time search capabilitiesdocuments are searchable within 1 second of being indexed. This is controlled by the refresh_interval setting, which defaults to 1s.
How do I delete documents by search criteria?
Use the Delete By Query API:
POST /products/_delete_by_query
{
"query": {
"term": {
"in_stock": false
}
}
}
Caution: This is a blocking operation. Use it during maintenance windows.
Conclusion
Mastering how to search data in Elasticsearch is essential for building modern, data-driven applications. From basic term queries to advanced aggregations and fuzzy matching, Elasticsearch offers a rich set of tools to handle virtually any search requirement. The key to success lies not just in knowing the syntax, but in understanding how indexing, mapping, and query execution interact under the hood.
By following the step-by-step guide in this tutorial, applying best practices like using filters over queries, optimizing mappings, and monitoring performance, youll ensure your Elasticsearch deployments are fast, scalable, and reliable. Real-world examplesfrom e-commerce to log analysisdemonstrate the versatility of Elasticsearch across industries.
Remember: Search is not just about returning resultsits about delivering the right results, at the right time, with minimal latency. As data volumes grow and user expectations rise, Elasticsearch remains one of the most powerful tools to meet those demands. Keep experimenting, test with real data, and leverage the extensive documentation and community to deepen your expertise.
Whether youre a developer building a product search, an analyst uncovering trends in logs, or an architect designing a scalable data platform, the ability to search effectively in Elasticsearch is a foundational skill that will serve you well for years to come.