How to Index Logs Into Elasticsearch

How to Index Logs Into Elasticsearch Indexing logs into Elasticsearch is a foundational practice in modern observability, DevOps, and security operations. As applications and infrastructure grow in complexity, the volume and velocity of log data increase exponentially. Without a centralized, searchable, and scalable system to manage this data, troubleshooting, monitoring, and compliance become ove

alex

Oct 30, 2025 - 20:41

How to Index Logs Into Elasticsearch

Indexing logs into Elasticsearch is a foundational practice in modern observability, DevOps, and security operations. As applications and infrastructure grow in complexity, the volume and velocity of log data increase exponentially. Without a centralized, searchable, and scalable system to manage this data, troubleshooting, monitoring, and compliance become overwhelming. Elasticsearch part of the Elastic Stack (ELK Stack) is one of the most powerful open-source search and analytics engines designed specifically for handling large volumes of structured and unstructured data, including logs. Indexing logs into Elasticsearch enables real-time analysis, pattern detection, alerting, and historical trend visualization. This tutorial provides a comprehensive, step-by-step guide to indexing logs into Elasticsearch, covering everything from setup to optimization, best practices, tools, real-world examples, and frequently asked questions. Whether you're managing logs from web servers, containers, cloud services, or custom applications, this guide equips you with the knowledge to implement a robust, production-grade log ingestion pipeline.

Step-by-Step Guide

1. Understand the Log Ingestion Pipeline

Before diving into configuration, its essential to understand the typical log ingestion pipeline when using Elasticsearch. The standard flow involves three components:

Log Source: The application, server, or service generating logs (e.g., Nginx, Apache, Docker, Kubernetes, Windows Event Log).
Log Shipper: A lightweight agent that collects, filters, and forwards logs to Elasticsearch (e.g., Filebeat, Fluentd, Logstash).
Elasticsearch: The search and analytics engine that stores, indexes, and makes logs searchable.

While Logstash can handle both ingestion and transformation, Filebeat is often preferred for its low resource footprint and direct integration with Elasticsearch. For this guide, well use Filebeat as the primary log shipper due to its simplicity, reliability, and official support from Elastic.

2. Install and Configure Elasticsearch

Before shipping logs, ensure Elasticsearch is installed and running. Elasticsearch can be deployed on-premises, in the cloud (Elastic Cloud), or via Docker.

Option A: Install via Package Manager (Linux)

For Ubuntu/Debian systems:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add - echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-8.x.list sudo apt update sudo apt install elasticsearch

After installation, edit the configuration file:

sudo nano /etc/elasticsearch/elasticsearch.yml

Ensure the following settings are configured:

cluster.name: my-logs-cluster node.name: node-1 network.host: 0.0.0.0 discovery.type: single-node

Start and enable Elasticsearch:

sudo systemctl start elasticsearch sudo systemctl enable elasticsearch

Verify its running:

curl -X GET "localhost:9200"

You should receive a JSON response with cluster details.

Option B: Run via Docker

If you prefer containerization:

docker run -d --name elasticsearch \ -p 9200:9200 -p 9300:9300 \ -e "discovery.type=single-node" \ -e "xpack.security.enabled=false" \ docker.elastic.co/elasticsearch/elasticsearch:8.12.0

Note: For production, always enable security (TLS, authentication) and avoid disabling xpack.security.

3. Install and Configure Filebeat

Filebeat is a lightweight log shipper that tails log files and forwards them to Elasticsearch or Logstash. Install Filebeat on the same host as your log sources.

Install Filebeat (Ubuntu/Debian):

sudo apt install filebeat

Configure Filebeat:

Edit the main configuration file:

sudo nano /etc/filebeat/filebeat.yml

Start with a minimal configuration:

filebeat.inputs: - type: filestream enabled: true paths: - /var/log/nginx/access.log - /var/log/nginx/error.log output.elasticsearch: hosts: ["http://localhost:9200"] index: "nginx-logs-%{+yyyy.MM.dd}"

Key configuration notes:

filestream: The newer input type (replaces log in Filebeat 7.14+), optimized for performance and reliability.
paths: Specify the exact file paths of your log files. Use wildcards if needed (e.g., /var/log/app/*.log).
index: Defines the Elasticsearch index pattern. Using date-based naming (e.g., nginx-logs-2024.06.15) enables index lifecycle management (ILM) and easier data rotation.

4. Enable and Load Filebeat Modules (Optional but Recommended)

Elastic provides pre-built modules for common log formats (Nginx, Apache, Syslog, Docker, etc.). These modules include predefined parsers, field mappings, and Kibana dashboards.

To enable the Nginx module:

sudo filebeat modules enable nginx

This automatically configures Filebeat to parse Nginx logs using the correct grok patterns and field names. To see all available modules:

sudo filebeat modules list

After enabling modules, reload the configuration:

sudo filebeat setup

This command does three things:

Loads index templates into Elasticsearch (ensuring correct field types).
Creates Kibana dashboards (if Kibana is available).
Initializes ILM policies.

5. Start and Test Filebeat

Start the Filebeat service:

sudo systemctl start filebeat sudo systemctl enable filebeat

Check the service status:

sudo systemctl status filebeat

Verify logs are being sent by checking Filebeats internal logs:

sudo tail -f /var/log/filebeat/filebeat

Look for lines like: INFO [publisher] pipeline/module.go:113 Start next batch this indicates active log shipping.

6. Verify Logs in Elasticsearch

Once Filebeat is running, check if logs are indexed in Elasticsearch:

curl -X GET "localhost:9200/_cat/indices?v"

You should see indices like nginx-logs-2024.06.15 with a status of green and document count > 0.

To view the actual indexed documents:

curl -X GET "localhost:9200/nginx-logs-*/_search?pretty"

This returns the first 10 log entries in JSON format. Look for fields like message, source.ip, http.request.method, and response.status_code these are automatically parsed by Filebeat modules.

7. Connect to Kibana for Visualization (Optional but Highly Recommended)

Kibana is the visualization layer of the Elastic Stack. Install it alongside Elasticsearch:

sudo apt install kibana

Edit the configuration:

sudo nano /etc/kibana/kibana.yml

Set:

server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]

Start Kibana:

sudo systemctl start kibana sudo systemctl enable kibana

Access Kibana at http://your-server-ip:5601.

Go to Stack Management > Index Patterns and create an index pattern matching your log index (e.g., nginx-logs-*). Select @timestamp as the time field.

Then navigate to Discover to explore your logs in real time. Use filters, search queries, and time ranges to drill down into specific events.

8. Set Up Index Lifecycle Management (ILM)

As log data grows, managing storage becomes critical. Elasticsearchs Index Lifecycle Management automates rollover, deletion, and optimization of indices.

By default, Filebeat setup enables ILM for modules. To verify:

curl -X GET "localhost:9200/_ilm/policy/filebeat-7-day-policy?pretty"

ILM policies typically follow this lifecycle:

Hot: Indexes are actively written to and queried.
Warm: Indexes are no longer written to but still searchable (moved to cheaper storage).
Cold: Rarely queried; stored on low-cost nodes.
Delete: Automatically removed after retention period (e.g., 30 days).

To customize ILM, define a custom policy in Kibana under Stack Management > Index Lifecycle Policies, or via API:

PUT _ilm/policy/my-log-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50gb",
"max_age": "7d"
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}

Then apply this policy to your index template:

PUT _index_template/nginx-logs-template
{
"index_patterns": ["nginx-logs-*"],
"template": {
"settings": {
"index.lifecycle.name": "my-log-policy",
"index.lifecycle.rollover_alias": "nginx-logs"
}
}
}

9. Secure Your Pipeline

In production, never expose Elasticsearch or Kibana without authentication and encryption.

Enable Security in Elasticsearch:

Edit /etc/elasticsearch/elasticsearch.yml:

xpack.security.enabled: true xpack.security.transport.ssl.enabled: true

Set passwords:

sudo /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic

Update Filebeat to use credentials:

output.elasticsearch: hosts: ["https://localhost:9200"] username: "filebeat_writer" password: "your-strong-password" ssl.certificate_authorities: ["/etc/pki/tls/certs/ca.crt"]

Generate a service user with minimal privileges:

POST /_security/user/filebeat_writer
{
"password": "your-password",
"roles": ["beats_writer"],
"full_name": "Filebeat Writer"
}

Repeat similar steps for Kibana by editing kibana.yml:

elasticsearch.username: "kibana_system" elasticsearch.password: "your-password"

Best Practices

1. Use Structured Logging Where Possible

Structured logs (JSON format) are far more efficient to parse and query than plain text. If you control the application, configure it to output logs in JSON:

{ "timestamp": "2024-06-15T10:30:00Z", "level": "INFO", "message": "User login successful", "user_id": "12345", "ip": "192.168.1.10" }

Filebeat can parse JSON logs natively using the json.keys_under_root option:

filebeat.inputs: - type: filestream paths: - /var/log/app/*.json json.keys_under_root: true json.add_error_key: true

This avoids complex grok patterns and improves performance.

2. Avoid Indexing Sensitive Data

Never index personally identifiable information (PII), passwords, API keys, or credit card numbers. Use Filebeats processors to drop or mask sensitive fields:

processors: - drop_fields: fields: ["password", "token", "ssn"] - add_fields: target: "redacted" fields: message: "SENSITIVE DATA REDACTED"

3. Optimize Index Settings for Logs

Logs are write-heavy and rarely updated. Configure indices with optimal settings:

Number of shards: 15 per index (avoid too many shards they increase overhead).
Number of replicas: 01 (replicas increase search performance and durability but use more storage).
Refresh interval: Set to 30s or higher to reduce I/O pressure: "index.refresh_interval": "30s".
Disable _source if not needed: Only if you never need to retrieve the original document: "_source": { "enabled": false }.

4. Use Index Templates for Consistency

Define index templates to enforce consistent field mappings across all log indices. This prevents mapping conflicts (e.g., a field being both string and integer).

Example template:

PUT _index_template/log-template
{
"index_patterns": ["app-logs-*"],
"template": {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"index.refresh_interval": "30s"
},
"mappings": {
"properties": {
"timestamp": { "type": "date" },
"level": { "type": "keyword" },
"message": { "type": "text", "analyzer": "standard" },
"user_id": { "type": "keyword" }
}
}
}
}

5. Monitor Resource Usage

Log ingestion can strain disk I/O, memory, and CPU. Monitor Elasticsearch with:

GET _nodes/stats
Kibanas Monitoring tab
System tools: htop, iostat, df -h

Scale horizontally by adding data nodes. Never run Elasticsearch and Filebeat on the same resource-constrained machine as your application.

6. Use Centralized Logging for Distributed Systems

In microservices or containerized environments (Docker, Kubernetes), use a sidecar Filebeat container or Fluentd daemonset to collect logs from all pods. Avoid relying on local file logs use stdout/stderr and let the container runtime handle log collection.

7. Retain Only What You Need

Define retention policies based on compliance and use cases. For example:

Security logs: retain 1 year
Application logs: retain 3090 days
Debug logs: retain 7 days

Use ILM to automate deletion never manually delete indices in production.

Tools and Resources

Core Tools

Elasticsearch: The search and storage engine. Download from elastic.co.
Filebeat: Lightweight log shipper. Part of the Elastic Stack. Filebeat Docs.
Kibana: Visualization and dashboarding. Kibana Docs.
Logstash: Advanced log processor (use if you need complex filtering or enrichment).
Fluentd: Open-source log collector, popular in Kubernetes environments. Fluentd.org.
Vector: High-performance, Rust-based log processor (emerging alternative to Fluentd and Logstash). Vector.dev.

Pre-built Modules and Templates

Elastic Modules: Pre-configured parsers for Nginx, Apache, MySQL, Redis, Windows Event Logs, Docker, and more. Enable via filebeat modules enable <module>.
OpenTelemetry Collector: Can export logs to Elasticsearch via OTLP. Ideal for cloud-native apps.
Elastic Common Schema (ECS): A standardized schema for log fields. Use it to ensure consistency across sources. ECS Documentation.

Monitoring and Alerting

Elastic Observability: Built-in dashboards for log health, throughput, and errors.
Elastic Alerts: Create alerts based on log patterns (e.g., 500 errors > 10/min).
Prometheus + Grafana: For system-level metrics (CPU, memory, disk) alongside logs.

Learning Resources

Elastic Documentation
Elastic YouTube Channel
Elastic Training Courses (free and paid)
Elastic GitHub Examples

Real Examples

Example 1: Indexing Nginx Access Logs

Scenario: You run a web server with Nginx and want to monitor traffic patterns, detect bots, and identify DDoS attempts.

Steps:

Install Filebeat on the Nginx server.
Run: sudo filebeat modules enable nginx
Configure filebeat.yml to point to /var/log/nginx/access.log.
Run: sudo filebeat setup
Start Filebeat.

Result: Elasticsearch receives logs with parsed fields:

source.ip Client IP address
http.request.method GET, POST
url.path Requested endpoint
response.status_code 200, 404, 500
user_agent.original Browser/device info

In Kibana, create a dashboard showing:

Top 10 most requested URLs
HTTP status code distribution
Geolocation of clients (via GeoIP)
Hourly request rate (to detect spikes)

Example 2: Centralized Docker Container Logging

Scenario: You run 50+ microservices in Docker Swarm/Kubernetes and need centralized log aggregation.

Solution:

Configure Docker daemon to use the json-file log driver (default).
Deploy Filebeat as a daemonset on each node.
Use this Filebeat input:

filebeat.inputs: - type: container paths: - /var/lib/docker/containers/*/*.log processors: - add_docker_metadata: ~

This automatically enriches logs with container metadata: container.id, container.name, image.name, etc.

Query in Kibana: container.name: "auth-service" and response.status_code: 500 instantly find failing services.

Example 3: Security Log Analysis with Syslog

Scenario: You need to detect brute-force SSH attacks on Linux servers.

Steps:

Enable rsyslog to forward logs to a central server: *.* @central-log-server:514
On the central server, configure Filebeat to read /var/log/secure (CentOS) or /var/log/auth.log (Ubuntu).
Enable the system module: filebeat modules enable system
Create an alert in Kibana: If event.action: "failed-login" and source.ip appears 10 times in 1 minute ? trigger alert.

This setup enables automated threat detection without manual log scanning.

FAQs

Can I index logs without Filebeat?

Yes. Alternatives include Logstash (for complex parsing), Fluentd (popular in Kubernetes), Vector (high-performance), or even custom scripts using the Elasticsearch Bulk API. However, Filebeat is recommended for most use cases due to its simplicity, low resource usage, and tight integration with Elasticsearch.

How much disk space do logs consume in Elasticsearch?

It varies by log volume and structure. A typical web server log entry is ~200500 bytes. 1 million logs = ~200500 MB. Use ILM to delete old data and compress indices (Elasticsearch uses LZ4 compression by default). Monitor usage with GET _cat/indices?v&h=index,store.size,pri.store.size.

What if my logs are not appearing in Elasticsearch?

Check:

Is Filebeat running? (systemctl status filebeat)
Are the log paths correct? Use filebeat test config and filebeat test output.
Is Elasticsearch reachable? Use curl -v http://localhost:9200.
Are there permission issues? Ensure Filebeat can read the log files.
Is the index pattern correct? Check Kibanas Index Patterns.

Can I index logs from cloud services like AWS or Azure?

Yes. Use AWS CloudWatch Logs + Lambda to forward to Elasticsearch, or use Azure Monitor with the Elastic Agent. Alternatively, install Filebeat on EC2 or Azure VMs and point it to local log files. Elastic also offers Cloudbeat for cloud-native security logging.

How do I handle high-volume log ingestion (100K+ events/sec)?

Scale Elasticsearch horizontally with multiple data nodes. Use multiple Filebeat instances behind a load balancer. Increase the bulk_max_size in Filebeats output configuration. Consider using Kafka or Redis as a buffer between Filebeat and Elasticsearch for resilience.

Is Elasticsearch the only option for log indexing?

No. Alternatives include:

OpenSearch: Fork of Elasticsearch, open-source, AWS-backed.
ClickHouse: Columnar database, excellent for analytics-heavy log queries.
Loki + Grafana: Lightweight, label-based log aggregation (ideal for Kubernetes).

But Elasticsearch remains the most mature, feature-rich, and widely adopted solution for structured log indexing and analysis.

Do I need Kibana to use Elasticsearch for logs?

No. You can query logs directly via the Elasticsearch API. But Kibana provides essential visualization, alerting, and UI tools that make log analysis practical. Without it, youre limited to raw JSON responses suitable only for automation, not human analysis.

Conclusion

Indexing logs into Elasticsearch is not merely a technical task its a strategic investment in operational visibility, security posture, and system reliability. By following the steps outlined in this guide from installing and configuring Elasticsearch and Filebeat, to applying best practices like structured logging, ILM, and security hardening you establish a scalable, maintainable, and production-ready log infrastructure.

The real power of Elasticsearch lies not in storing logs, but in transforming them into actionable insights. Whether youre diagnosing a production outage, detecting malicious activity, or optimizing application performance, having logs indexed and searchable empowers your team to act faster and with greater confidence.

As your infrastructure evolves, continue to refine your logging strategy. Adopt ECS for consistency, automate retention policies, monitor ingestion health, and integrate with alerting systems. The goal is not just to collect logs its to make them a living, breathing component of your operational intelligence.

Start small. Test with one service. Expand gradually. And always prioritize security and efficiency. With the right setup, Elasticsearch becomes more than a logging tool it becomes your central nervous system for understanding your digital environment.

alex

How to Index Logs Into Elasticsearch

How to Index Logs Into Elasticsearch

Step-by-Step Guide

1. Understand the Log Ingestion Pipeline

2. Install and Configure Elasticsearch

3. Install and Configure Filebeat

4. Enable and Load Filebeat Modules (Optional but Recommended)

5. Start and Test Filebeat

6. Verify Logs in Elasticsearch

7. Connect to Kibana for Visualization (Optional but Highly Recommended)

8. Set Up Index Lifecycle Management (ILM)

9. Secure Your Pipeline

Best Practices

1. Use Structured Logging Where Possible

2. Avoid Indexing Sensitive Data

3. Optimize Index Settings for Logs

4. Use Index Templates for Consistency

5. Monitor Resource Usage

6. Use Centralized Logging for Distributed Systems

7. Retain Only What You Need

Tools and Resources

Core Tools

Pre-built Modules and Templates

Monitoring and Alerting

Learning Resources

Real Examples

Example 1: Indexing Nginx Access Logs

Example 2: Centralized Docker Container Logging

Example 3: Security Log Analysis with Syslog

FAQs

Can I index logs without Filebeat?

How much disk space do logs consume in Elasticsearch?

What if my logs are not appearing in Elasticsearch?

Can I index logs from cloud services like AWS or Azure?

How do I handle high-volume log ingestion (100K+ events/sec)?

Is Elasticsearch the only option for log indexing?

Do I need Kibana to use Elasticsearch for logs?

Conclusion

Related Posts

Popular Posts

Recommended Posts

Popular Tags