How to Setup Elk Stack
How to Setup Elk Stack The Elk Stack—comprising Elasticsearch, Logstash, and Kibana—is one of the most powerful and widely adopted open-source platforms for log management, real-time analytics, and observability. Originally developed by Elastic, the stack has become the de facto standard for organizations seeking to centralize, visualize, and analyze massive volumes of structured and unstructured
How to Setup Elk Stack
The Elk Stack—comprising Elasticsearch, Logstash, and Kibana—is one of the most powerful and widely adopted open-source platforms for log management, real-time analytics, and observability. Originally developed by Elastic, the stack has become the de facto standard for organizations seeking to centralize, visualize, and analyze massive volumes of structured and unstructured data. Whether you're monitoring application logs, securing infrastructure, or optimizing user behavior, the Elk Stack provides the tools needed to transform raw data into actionable insights.
Setting up the Elk Stack correctly is critical to ensuring performance, scalability, and reliability. A poorly configured stack can lead to data loss, indexing bottlenecks, or degraded search performance. This guide walks you through every step required to deploy a production-ready Elk Stack, from initial installation to advanced configuration and optimization. By the end of this tutorial, you will have a fully functional, secure, and scalable Elk Stack environment ready to ingest, process, and visualize data from multiple sources.
Step-by-Step Guide
Prerequisites
Before beginning the setup process, ensure your environment meets the following requirements:
- A Linux-based server (Ubuntu 22.04 LTS or CentOS 8/9 recommended)
- At least 4 GB of RAM (8 GB or more recommended for production)
- Minimum 2 CPU cores
- At least 20 GB of free disk space (SSD strongly recommended)
- Root or sudo access
- Java 11 or Java 17 installed (Elasticsearch requires a JVM)
- Network connectivity for package downloads and external data sources
For production deployments, consider deploying each component on separate servers to isolate workloads and improve fault tolerance. For learning or development purposes, a single-node setup is acceptable.
Step 1: Install Java
Elasticsearch is built on Java and requires a compatible Java Virtual Machine (JVM) to run. Oracle JDK is no longer freely available for production use, so we recommend OpenJDK.
On Ubuntu:
sudo apt update
sudo apt install openjdk-17-jdk -y
On CentOS/RHEL:
sudo dnf install java-17-openjdk-devel -y
Verify the installation:
java -version
You should see output indicating OpenJDK 17 is installed. If multiple Java versions exist, set the default using:
sudo update-alternatives --config java
Step 2: Install Elasticsearch
Elasticsearch is the distributed search and analytics engine at the core of the Elk Stack. It stores, indexes, and enables fast retrieval of data.
Add the Elastic GPG key and repository:
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
Update the package list and install Elasticsearch:
sudo apt update
sudo apt install elasticsearch -y
Configure Elasticsearch by editing its main configuration file:
sudo nano /etc/elasticsearch/elasticsearch.yml
Update the following key settings:
cluster.name: my-elk-cluster
node.name: node-1
network.host: 0.0.0.0
discovery.type: single-node
http.port: 9200
cluster.initial_master_nodes: ["node-1"]
Note: In a multi-node cluster, replace discovery.type: single-node with discovery.seed_hosts and cluster.initial_master_nodes with the IP addresses of all master-eligible nodes.
Enable and start Elasticsearch:
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch
Verify Elasticsearch is running:
curl -X GET "localhost:9200"
You should receive a JSON response containing cluster details, including version and name.
Step 3: Install Kibana
Kibana is the visualization layer of the Elk Stack. It provides a web interface to explore data, build dashboards, and monitor system health.
Install Kibana using the same repository:
sudo apt install kibana -y
Edit the Kibana configuration file:
sudo nano /etc/kibana/kibana.yml
Set the following values:
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]
i18n.locale: "en"
Enable and start Kibana:
sudo systemctl enable kibana
sudo systemctl start kibana
Verify Kibana is accessible by visiting http://your-server-ip:5601 in your browser. You should see the Kibana welcome screen.
Step 4: Install Logstash
Logstash is the data processing pipeline that ingests data from multiple sources, transforms it, and sends it to Elasticsearch. It supports a wide range of inputs, filters, and outputs.
Install Logstash:
sudo apt install logstash -y
Logstash configurations are stored in /etc/logstash/conf.d/. Create a basic configuration file:
sudo nano /etc/logstash/conf.d/01-input.conf
Add the following input configuration to accept data via Beats (Filebeat) or TCP:
input {
beats {
port => 5044
}
}
Create a filter configuration to parse logs (optional):
sudo nano /etc/logstash/conf.d/02-filter.conf
Add a simple Grok filter for Apache logs:
filter {
if [type] == "apache-access" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
}
Create an output configuration to send data to Elasticsearch:
sudo nano /etc/logstash/conf.d/03-output.conf
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
}
Test your configuration for syntax errors:
sudo /usr/share/logstash/bin/logstash --path.settings /etc/logstash -t
If the test passes, start Logstash:
sudo systemctl enable logstash
sudo systemctl start logstash
Step 5: Install Filebeat (Optional but Recommended)
While Logstash can ingest data directly, Filebeat is a lightweight, resource-efficient shipper designed specifically for forwarding log files to Logstash or Elasticsearch. It is ideal for server-side log collection.
Install Filebeat:
sudo apt install filebeat -y
Configure Filebeat to send logs to Logstash:
sudo nano /etc/filebeat/filebeat.yml
Update the following sections:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/*.log
- /var/log/apache2/*.log
output.logstash:
hosts: ["localhost:5044"]
Enable the Apache module (if applicable):
sudo filebeat modules enable apache2
sudo filebeat setup
Start Filebeat:
sudo systemctl enable filebeat
sudo systemctl start filebeat
Step 6: Configure Kibana Index Patterns
Once data begins flowing into Elasticsearch, you need to define index patterns in Kibana to make the data searchable and visualizable.
Open Kibana in your browser at http://your-server-ip:5601.
Navigate to Stack Management → Index Patterns → Create Index Pattern.
Enter the index pattern name. For Filebeat, use filebeat-*. For Logstash, use logstash-*.
Select @timestamp as the time field and click Create index pattern.
Once created, go to Discover to explore raw log entries. You should now see data appearing in real time.
Step 7: Create Your First Dashboard
With data indexed, create visualizations and dashboards to monitor system health.
Go to Dashboard → Create dashboard.
Click Add from library and select a pre-built template like “System” or “Apache” if you’re using Filebeat modules.
Alternatively, create custom visualizations:
- Go to Visualize Library → Create visualization
- Select “Line” or “Bar” chart
- Choose your index pattern
- Set X-axis to “Date Histogram” based on @timestamp
- Set Y-axis to “Count” or a custom metric like “response_code”
Save each visualization and add it to your dashboard. Name your dashboard “Server Monitoring” or similar.
Best Practices
1. Use Separate Nodes for Production Deployments
In production environments, avoid running Elasticsearch, Logstash, and Kibana on the same server. Distribute them across dedicated nodes to prevent resource contention. Elasticsearch requires significant memory and CPU for indexing and search operations. Logstash can be memory-intensive during transformation pipelines. Kibana, while lighter, benefits from low-latency network access to Elasticsearch.
2. Secure Your Stack with TLS and Authentication
By default, the Elk Stack runs without authentication. In any environment exposed to external networks, enable security features:
- Enable Elasticsearch’s built-in security: Set
xpack.security.enabled: trueinelasticsearch.yml - Generate certificates using
elasticsearch-certutilfor encrypted communication - Configure Kibana to use HTTPS and authenticate against Elasticsearch
- Use role-based access control (RBAC) to restrict user permissions
Run the following to generate certificates:
cd /usr/share/elasticsearch
sudo bin/elasticsearch-certutil cert --out /opt/certs.zip
sudo unzip /opt/certs.zip -d /etc/elasticsearch/certs/
Update elasticsearch.yml:
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12
Update kibana.yml:
elasticsearch.hosts: ["https://localhost:9200"]
elasticsearch.ssl.certificateAuthorities: ["/etc/elasticsearch/certs/elastic-certificates.p12"]
elasticsearch.username: "kibana_system"
elasticsearch.password: "your-strong-password"
server.ssl.enabled: true
server.ssl.certificate: /etc/kibana/certs/kibana.crt
server.ssl.key: /etc/kibana/certs/kibana.key
3. Optimize Elasticsearch Indexing and Sharding
Index design directly impacts performance. Follow these guidelines:
- Use time-based indices (e.g.,
logs-2024.05.01) for log data to enable efficient retention policies - Limit the number of shards per index (ideally under 50 per node)
- Set
number_of_shardsto match the number of data nodes (e.g., 3 shards for 3 nodes) - Set
number_of_replicasto 1 in production for high availability - Use index lifecycle management (ILM) to automate rollover and deletion
Example ILM policy via Kibana Dev Tools:
PUT _ilm/policy/logs_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50GB",
"max_age": "30d"
}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
4. Monitor Resource Usage and Set JVM Heap Limits
Elasticsearch is memory-sensitive. Never allocate more than 50% of your system RAM to the JVM heap, and cap it at 32 GB due to JVM pointer compression limits.
Edit /etc/elasticsearch/jvm.options:
-Xms4g
-Xmx4g
Monitor heap usage using Kibana’s Monitoring tab or external tools like Prometheus and Grafana.
5. Use Filebeat Modules for Standardized Parsing
Filebeat comes with pre-built modules for common services like Apache, Nginx, MySQL, and System logs. These modules include optimized parsers, dashboards, and index templates.
Enable a module:
sudo filebeat modules enable apache2 mysql system
Reload the configuration:
sudo filebeat setup
sudo systemctl restart filebeat
This reduces the need for custom Grok patterns and ensures consistency across environments.
6. Implement Log Retention and Cleanup
Logs can consume massive disk space. Automate cleanup using Elasticsearch’s Index Lifecycle Management (ILM) or Curator (deprecated in favor of ILM).
Use ILM policies to automatically delete indices older than 90 days, reducing storage costs and maintaining performance.
7. Back Up Critical Data Regularly
Use Elasticsearch snapshots to back up indices to shared storage (NFS, S3, HDFS):
PUT _snapshot/my_backup
{
"type": "fs",
"settings": {
"location": "/mnt/backups/elasticsearch"
}
}
Take a snapshot:
PUT _snapshot/my_backup/snapshot_1
Restore when needed:
POST _snapshot/my_backup/snapshot_1/_restore
Tools and Resources
Official Documentation
Always refer to the official Elastic documentation for version-specific details:
Monitoring and Alerting Tools
Enhance your Elk Stack with external monitoring tools:
- Prometheus + Grafana – Monitor system metrics (CPU, memory, disk I/O) and Elasticsearch cluster health
- Alertmanager – Trigger notifications based on Kibana alert rules
- Netdata – Real-time system monitoring with built-in Elasticsearch integration
Community and Support
Engage with the active Elk Stack community for troubleshooting and best practices:
Sample Data Generators
For testing and development, generate realistic log data:
- GoAccess – Generate Apache/Nginx logs from sample traffic
- Loggen – A utility to simulate high-volume log streams
- Mockaroo – Generate custom JSON/CSV datasets for testing
Containerized Deployments (Docker & Kubernetes)
For scalable, portable deployments, use Docker Compose:
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
ports:
- "9200:9200"
volumes:
- esdata:/usr/share/elasticsearch/data
kibana:
image: docker.elastic.co/kibana/kibana:8.12.0
ports:
- "5601:5601"
depends_on:
- elasticsearch
logstash:
image: docker.elastic.co/logstash/logstash:8.12.0
ports:
- "5044:5044"
volumes:
- ./logstash/pipeline:/usr/share/logstash/pipeline
depends_on:
- elasticsearch
volumes:
esdata:
Run with:
docker-compose up -d
Real Examples
Example 1: Monitoring Web Server Logs
A mid-sized e-commerce company uses the Elk Stack to monitor Apache web server logs across 12 frontend servers. Each server runs Filebeat to ship access and error logs to a central Logstash instance.
Logstash applies filters to extract:
- Client IP addresses
- HTTP status codes
- Request duration
- User agent strings
These fields are indexed into Elasticsearch. Kibana dashboards display:
- Real-time traffic spikes
- Top 10 most visited pages
- 4xx/5xx error trends
- Geographic distribution of visitors
Alerts are configured to notify the DevOps team when error rates exceed 5% for 5 minutes. This proactive monitoring reduced incident response time by 70%.
Example 2: Security Incident Detection
A financial services firm uses the Elk Stack to detect anomalous SSH login attempts. Filebeat collects system logs from 50+ Linux servers. Logstash parses auth.log and tags failed login attempts.
A Kibana machine learning job analyzes login frequency by user and IP. It flags:
- Multiple failed logins from the same IP within 60 seconds
- Logins from unusual geographic locations
- Attempts using known compromised usernames
When anomalies are detected, an alert triggers a Slack notification and automatically blocks the IP via firewall rules. This system has prevented 12 brute-force attacks in the last quarter.
Example 3: Application Performance Monitoring
A SaaS provider instruments its Node.js application to emit structured JSON logs to stdout. These logs are captured by Filebeat and sent to Logstash.
Logstash enriches logs with:
- Environment (production/staging)
- Service name
- Request ID for distributed tracing
Kibana visualizations track:
- Latency percentiles (p95, p99)
- Throughput per endpoint
- Database query durations
Engineers use these dashboards to identify slow API endpoints and optimize database queries, resulting in a 40% reduction in average response time.
FAQs
What is the difference between the Elk Stack and the EFK Stack?
The Elk Stack uses Logstash and Filebeat for log ingestion, while the EFK Stack (Elasticsearch, Fluentd, Kibana) replaces Logstash with Fluentd. Fluentd is often preferred in Kubernetes environments due to its native container support and lightweight architecture. However, Logstash offers richer filtering capabilities and a larger plugin ecosystem.
Can I use the Elk Stack without Kibana?
Yes. Elasticsearch can be queried directly via its REST API using tools like cURL, Postman, or Python scripts. However, Kibana provides a user-friendly interface for visualization, dashboards, and monitoring that is essential for most teams.
How much disk space does the Elk Stack require?
Storage needs depend entirely on data volume. As a rule of thumb: 10 GB per day of uncompressed logs is a reasonable estimate for medium traffic. Always provision additional space for replication, snapshots, and temporary indexing buffers.
Is the Elk Stack free to use?
Elasticsearch, Logstash, and Kibana are open-source under the SSPL (Server Side Public License). Core features are free. However, advanced features like machine learning, alerting, and security are part of Elastic’s paid subscription (Elastic Stack Premium). For many use cases, the free tier is sufficient.
Why is my Kibana dashboard blank even though Elasticsearch has data?
Common causes include:
- Incorrect index pattern (e.g., typing
logstashinstead oflogstash-*) - Time filter set to a range with no data
- Index not yet created (wait for data to be ingested)
- Permissions issue preventing Kibana from reading indices
Check the Discover tab first to confirm data exists. Then verify your time filter and index pattern.
How do I upgrade the Elk Stack to a newer version?
Always follow Elastic’s upgrade guide. Never skip major versions. Steps include:
- Take a snapshot of all indices
- Stop all services (Kibana → Logstash → Elasticsearch)
- Upgrade Elasticsearch first
- Upgrade Logstash
- Upgrade Kibana
- Restart services in reverse order
- Verify data integrity and functionality
Can I run the Elk Stack on Windows?
Yes. Elastic provides Windows installers for Elasticsearch, Kibana, and Filebeat. However, Linux is strongly recommended for production due to better performance, stability, and community support.
What should I do if Elasticsearch fails to start?
Check the logs:
sudo journalctl -u elasticsearch -n 50 --no-pager
Common issues:
- Insufficient memory (adjust JVM heap)
- Port conflict (9200 or 9300 already in use)
- File permissions on data directory
- Invalid configuration syntax
Conclusion
Setting up the Elk Stack is a foundational skill for modern DevOps, SRE, and security teams. From centralized logging to real-time monitoring and anomaly detection, the stack empowers organizations to gain deep visibility into their systems and applications. This guide has walked you through the complete process—from installing Java and configuring Elasticsearch, Logstash, and Kibana, to implementing security, optimization, and real-world use cases.
Remember: a well-configured Elk Stack is not a one-time setup. It requires ongoing maintenance, monitoring, and refinement. Regularly review your index patterns, update your filters, and expand your dashboards as your data needs evolve. Use automation tools like ILM, Docker, and configuration management systems (Ansible, Terraform) to scale your deployment reliably.
As data volumes continue to grow and system complexity increases, the Elk Stack remains one of the most robust, flexible, and community-supported solutions available. Whether you’re managing a single server or a global infrastructure, investing time in mastering the Elk Stack will pay dividends in operational efficiency, faster troubleshooting, and proactive system health management.