How to Configure Fluentd
How to Configure Fluentd Fluentd is an open-source data collector designed to unify logging solutions across diverse systems, applications, and environments. As modern infrastructure grows increasingly distributed—with microservices, containers, cloud platforms, and hybrid deployments—centralized log management has become a critical component of observability, troubleshooting, and compliance. Flue
How to Configure Fluentd
Fluentd is an open-source data collector designed to unify logging solutions across diverse systems, applications, and environments. As modern infrastructure grows increasingly distributed—with microservices, containers, cloud platforms, and hybrid deployments—centralized log management has become a critical component of observability, troubleshooting, and compliance. Fluentd excels in this space by providing a flexible, reliable, and scalable platform for collecting, filtering, and forwarding logs in real time. Whether you're managing a small application stack or a large Kubernetes cluster, configuring Fluentd correctly ensures that your logs are captured efficiently, structured meaningfully, and delivered to the right destinations for analysis.
This guide walks you through the complete process of configuring Fluentd—from installation to advanced routing and optimization. You’ll learn how to tailor Fluentd to your specific use case, implement best practices for performance and reliability, leverage essential tools, and apply real-world configurations that have been battle-tested in production environments. By the end of this tutorial, you’ll have a comprehensive understanding of Fluentd’s architecture and the confidence to deploy it confidently in any environment.
Step-by-Step Guide
1. Understanding Fluentd’s Architecture
Before diving into configuration, it’s essential to understand Fluentd’s core components and how they interact. Fluentd operates on a plugin-based architecture, where each function is handled by a modular plugin. The three primary components are:
- Sources: Define where logs are collected from (e.g., files, syslog, HTTP endpoints, Docker containers).
- Filters: Modify, enrich, or transform log records before forwarding (e.g., parsing JSON, adding tags, removing sensitive fields).
- Sinks: Specify where logs are sent (e.g., Elasticsearch, S3, Kafka, stdout).
Logs flow from source → filter → sink. Each component is configured using a declarative syntax in the Fluentd configuration file, typically named fluentd.conf. The configuration file uses a simple key-value structure with sections enclosed in <source>, <filter>, and <match> tags.
2. Installing Fluentd
Fluentd supports multiple operating systems and deployment models. Below are the most common installation methods.
On Ubuntu/Debian
Install Fluentd using the official package repository:
curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-focal-td-agent4.sh | sh
This installs td-agent, the enterprise version of Fluentd maintained by Treasure Data, which includes precompiled plugins and better stability for production use.
After installation, verify it’s running:
sudo systemctl status td-agent
On CentOS/RHEL
Use the following command to install td-agent:
curl -L https://toolbelt.treasuredata.com/sh/install-redhat-8-td-agent4.sh | sh
Then start and enable the service:
sudo systemctl start td-agent
sudo systemctl enable td-agent
Using Docker
For containerized environments, Fluentd can be run as a sidecar or centralized logging container:
docker run -d --name fluentd -p 24224:24224 -p 24224:24224/udp -v $(pwd)/fluentd.conf:/etc/fluent/fluent.conf fluent/fluentd:latest
Ensure your configuration file (fluentd.conf) is mounted into the container at /etc/fluent/fluent.conf.
From Source (Advanced)
If you need the latest development version or custom plugins, install Fluentd via RubyGems:
gem install fluentd
Then start the service manually:
fluentd -c /path/to/fluentd.conf
Note: This method is not recommended for production due to lack of service management and dependency control.
3. Basic Configuration File Structure
The Fluentd configuration file is written in a domain-specific language (DSL) using <source>, <filter>, and <match> blocks. Here’s a minimal working configuration:
<source>
@type tail
path /var/log/nginx/access.log
pos_file /var/log/td-agent/nginx-access.log.pos
tag nginx.access
format nginx
</source>
<match **>
@type stdout
</match>
Let’s break this down:
<source>defines a tail input, reading from Nginx’s access log file.pos_filetracks the last read position to avoid duplicate logs on restart.tag nginx.accessassigns a label to the log stream for routing.format nginxuses Fluentd’s built-in parser to extract fields like IP, method, status, and user agent.<match **>matches all tags and sends logs to stdout.
Save this as /etc/td-agent/fluentd.conf (or wherever your config directory is) and restart Fluentd:
sudo systemctl restart td-agent
Check logs for errors:
sudo journalctl -u td-agent -f
4. Configuring Multiple Sources
Most environments require collecting logs from multiple sources. Here’s an example that collects logs from Nginx, system syslog, and a custom application:
<source>
@type tail
path /var/log/nginx/access.log
pos_file /var/log/td-agent/nginx-access.log.pos
tag nginx.access
format nginx
</source>
<source>
@type tail
path /var/log/nginx/error.log
pos_file /var/log/td-agent/nginx-error.log.pos
tag nginx.error
format /^(?
This configuration:
- Reads Nginx access logs with the built-in parser.
- Uses a custom regex to parse Nginx error logs.
- Accepts syslog messages over TCP on port 5140.
- Enables Fluentd’s forward protocol for receiving logs from other Fluentd instances (useful in distributed setups).
5. Applying Filters for Data Enrichment
Raw logs are rarely ready for analysis. Filters allow you to clean, parse, and enhance data before sending it to storage.
JSON Parsing Filter
If your application logs in JSON format:
<filter app.json>
@type parser
key_name log
reserve_data true
reserve_time true
format json
</filter>
This extracts the JSON string from the log field and converts it into structured key-value pairs. reserve_data keeps the original field, and reserve_time preserves the original timestamp.
Adding Metadata with Record Transformer
Enrich logs with environment or host information:
<filter **>
@type record_transformer
enable_ruby true
<record>
hostname ${ENV['HOSTNAME']}
environment production
</record>
</filter>
This adds two fields to every log record: the container or host name and the deployment environment.
Removing Sensitive Data
Comply with data privacy regulations by redacting PII:
<filter **>
@type grep
<exclude>
key message
pattern \b\d{3}-\d{2}-\d{4}\b
SSN pattern
</exclude>
<exclude>
key message
pattern \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
Email
</exclude>
</filter>
This removes any log entries containing social security numbers or email addresses.
6. Configuring Output Sinks
Fluentd supports over 600 plugins for output destinations. Here are the most common ones.
Elasticsearch
Install the plugin:
sudo td-agent-gem install fluent-plugin-elasticsearch
Configure the output:
<match nginx.access>
@type elasticsearch
host elasticsearch.example.com
port 9200
logstash_format true
logstash_prefix nginx
logstash_dateformat %Y%m%d
include_tag_key true
tag_key @log_name
flush_interval 10s
</match>
Fluentd will send logs to Elasticsearch with daily indices (e.g., nginx-20240615), making it easier to manage retention and performance.
Amazon S3
Install the plugin:
sudo td-agent-gem install fluent-plugin-s3
Configure for batch archiving:
<match system.syslog>
@type s3
aws_key_id YOUR_AWS_KEY
aws_sec_key YOUR_AWS_SECRET
s3_bucket my-logs-bucket
s3_region us-east-1
path logs/system/
buffer_path /var/log/td-agent/buffer/s3
time_slice_format %Y/%m/%d/%H
time_slice_wait 10m
utc
format json
</match>
This batches logs every 10 minutes and uploads them to S3 in structured JSON format—ideal for compliance and long-term storage.
Kafka
For high-throughput streaming:
sudo td-agent-gem install fluent-plugin-kafka
<match **>
@type kafka_buffered
brokers kafka1:9092,kafka2:9092
default_topic logs
output_data_type json
compression_codec gzip
max_send_retries 3
required_acks -1
buffer_type file
buffer_path /var/log/td-agent/buffer/kafka
flush_interval 5s
</match>
Kafka acts as a durable buffer, decoupling log producers from consumers and providing resilience during downstream outages.
7. Testing and Validating Configuration
Always validate your configuration before restarting Fluentd:
sudo td-agent --dry-run -c /etc/td-agent/fluentd.conf
This checks syntax and plugin availability without starting the service.
To test log ingestion manually, use fluent-cat:
echo '{"message":"test log","level":"info"}' | fluent-cat app.json
If your configuration includes a match for app.json, the log will appear in your output destination.
8. Monitoring Fluentd
Enable Fluentd’s built-in metrics endpoint to monitor performance:
<system>
log_level info
<plugin>
@type prometheus
port 24231
metrics_path /metrics
</plugin>
</system>
Access metrics at http://localhost:24231/metrics to view:
- Buffer queue sizes
- Output success/failure rates
- Memory usage
- Event throughput
Integrate with Prometheus and Grafana for real-time dashboards.
Best Practices
1. Use td-agent Over Vanilla Fluentd in Production
td-agent is a hardened, packaged version of Fluentd with tested dependencies, automatic updates, and systemd integration. Avoid installing Fluentd via gem in production environments due to potential version conflicts and lack of support.
2. Separate Logs by Tag and Route Accordingly
Use meaningful tags like app.web, db.mysql, infra.network to distinguish log sources. This enables targeted filtering, routing, and retention policies.
3. Always Use pos_file for Tail Sources
Without a pos_file, Fluentd will re-read entire files on restart, causing duplicate logs. Always specify a unique path for each log file.
4. Buffer Logs Locally Before Remote Output
Network interruptions are inevitable. Use file-based buffers with appropriate flush intervals to avoid data loss:
buffer_type file
buffer_path /var/log/td-agent/buffer/nginx
flush_interval 10s
flush_thread_count 2
retry_max_times 10
retry_wait 10s
This ensures logs are stored locally during outages and retried automatically.
5. Avoid Heavy Processing in Filters
Complex Ruby expressions or large regex patterns can slow down log ingestion. Use built-in parsers (e.g., json, nginx, syslog) instead of custom regex when possible.
6. Secure Communication
When sending logs over the network, use TLS:
- Enable TLS in Elasticsearch output with
ssl_verify false(only if using self-signed certs) orssl_verify truewith CA bundle. - Use TLS for forward and syslog inputs.
- Restrict access to Fluentd ports using firewalls or network policies.
7. Limit Log Volume with Sampling
For high-volume applications, consider sampling logs to reduce cost and storage:
<filter app.highvolume>
@type sampler
rate 10
</filter>
This forwards only 1 in 10 log events, reducing load while preserving statistical relevance.
8. Implement Log Rotation
Ensure your log files are rotated regularly (using logrotate) and that Fluentd’s pos_file is updated correctly. Use refresh_interval in tail sources to detect rotated files:
refresh_interval 60s
9. Version Control Your Configuration
Treat Fluentd configuration as code. Store it in Git, apply CI/CD practices, and deploy via configuration management tools like Ansible, Puppet, or Terraform.
10. Regularly Audit and Update Plugins
Keep Fluentd and its plugins updated to benefit from security patches and performance improvements. Use td-agent-gem list to check versions.
Tools and Resources
Official Documentation
The most authoritative resource is the Fluentd Documentation. It includes plugin references, configuration examples, and architecture diagrams.
Fluentd Plugin Registry
Explore all available plugins at https://www.fluentd.org/plugins/all. Filter by category (input, filter, output) and check community ratings and update frequency.
Fluent Bit (Lightweight Alternative)
For resource-constrained environments (e.g., edge devices, IoT), consider Fluent Bit—a faster, lower-memory cousin of Fluentd. It shares similar syntax and can forward to the same destinations.
Containerized Deployments
Use Helm charts for Kubernetes:
Monitoring Tools
- Prometheus + Grafana: For visualizing Fluentd metrics.
- Elastic Stack (ELK): For centralized log search and dashboards.
- Datadog: Offers native Fluentd integration with pre-built monitors.
- Logstash: Can be used alongside Fluentd for complex transformations, though Fluentd is generally preferred for ingestion.
Debugging Tools
fluent-cat: Inject test logs for validation.journalctl -u td-agent: View Fluentd service logs.tail -f /var/log/td-agent/td-agent.log: Monitor Fluentd’s internal logs.netstat -tlnp | grep 24224: Verify Fluentd is listening on expected ports.
Community and Support
Join the Fluentd GitHub repository to report bugs, request features, or contribute plugins. The community is active and responsive.
Real Examples
Example 1: Kubernetes Cluster Logging
In a Kubernetes environment, Fluentd runs as a DaemonSet on each node to collect container logs from /var/log/containers/.
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
@type json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<filter kubernetes.**>
@type kubernetes_metadata
</filter>
<match kubernetes.**>
@type elasticsearch
host elasticsearch.logging.svc.cluster.local
port 9200
logstash_format true
logstash_prefix k8s-logs
include_tag_key true
flush_interval 5s
<buffer>
@type file
path /var/log/fluentd-buffers/kubernetes.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_max_times 10
chunk_limit_size 2M
queue_limit_length 8
overflow_action block
</buffer>
</match>
This configuration:
- Reads all container logs in JSON format.
- Uses the
kubernetes_metadataplugin to enrich logs with pod, namespace, and container metadata. - Sends logs to an Elasticsearch cluster within the same Kubernetes namespace.
- Uses buffered output with fail-safe behavior to prevent data loss during Elasticsearch downtime.
Example 2: Multi-Tenant Application Logging
A SaaS platform needs to separate logs by customer ID for compliance and billing purposes.
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
<filter app.**>
@type record_transformer
enable_ruby true
<record>
customer_id ${record['customer_id'] || 'unknown'}
</record>
</filter>
<match app.**>
@type rewrite_tag_filter
<rule>
key customer_id
pattern ^(.+)$
tag customer.${customer_id}
</rule>
</match>
<match customer.*>
@type s3
aws_key_id YOUR_KEY
aws_sec_key YOUR_SECRET
s3_bucket your-logs-bucket
s3_region us-east-1
path logs/customer/${tag_parts[1]}/
time_slice_format %Y/%m/%d/%H
time_slice_wait 5m
utc
format json
</match>
This routes logs to separate S3 folders per customer (e.g., logs/customer/acme-inc/), enabling fine-grained access control and audit trails.
Example 3: Hybrid On-Premises and Cloud Logging
A company has on-premises servers and AWS EC2 instances. Both send logs to a central Fluentd aggregator in AWS.
On-premises Fluentd (forwarder):
<source>
@type tail
path /var/log/app.log
tag app.prod
format json
</source>
<match app.prod>
@type forward
<server>
host fluentd-aggregator.aws.example.com
port 24224
</server>
<buffer>
@type file
path /var/log/td-agent/buffer/forward
flush_interval 10s
retry_max_times 15
</buffer>
</match>
Cloud Fluentd (aggregator):
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
<match app.prod>
@type s3
aws_key_id YOUR_AWS_KEY
aws_sec_key YOUR_AWS_SECRET
s3_bucket company-logs
path logs/onprem/app/
time_slice_format %Y/%m/%d/%H
time_slice_wait 10m
utc
</match>
This design ensures logs survive network outages and are stored durably in the cloud.
FAQs
What is the difference between Fluentd and Fluent Bit?
Fluentd is a full-featured, Ruby-based log collector with extensive plugin support and rich filtering capabilities. It’s ideal for complex environments requiring deep log transformation. Fluent Bit is a lightweight, Go-based alternative designed for speed and low memory usage—perfect for containers, edge devices, and Kubernetes nodes. Fluent Bit can forward logs to Fluentd for advanced processing.
How do I handle log duplication in Fluentd?
Log duplication typically occurs when:
- Multiple Fluentd instances read the same log file.
- pos_file is missing or shared between instances.
- Logs are forwarded multiple times through overlapping match rules.
Solutions: Use unique pos_file paths per source, avoid overlapping tags, and use unique_id in forward outputs to prevent circular forwarding.
Can Fluentd parse non-JSON logs like Apache or custom formats?
Yes. Fluentd supports regex parsing via the parser filter. For example, Apache Common Log Format:
<source>
@type tail
path /var/log/apache2/access.log
tag apache.access
<parse>
@type regexp
expression /^(?[^ ]*) [^ ]* (?[^ ]*) \[(?
How do I reduce Fluentd’s memory usage?
Optimize by:
- Using Fluent Bit for ingestion and forwarding to Fluentd for processing.
- Reducing buffer chunk sizes (
chunk_limit_size). - Limiting the number of concurrent flush threads (
flush_thread_count). - Disabling unnecessary plugins.
- Using file buffers instead of memory buffers where possible.
Does Fluentd support log retention and rotation?
Fluentd itself does not manage log retention. It forwards logs to destinations that do—such as Elasticsearch (with ILM), S3 (with lifecycle policies), or Kafka (with topic retention settings). Configure retention at the sink level.
How do I troubleshoot a Fluentd configuration that isn’t working?
Follow this checklist:
- Run
td-agent --dry-runto validate syntax. - Check
journalctl -u td-agentfor startup errors. - Verify file permissions on log files and pos_file directories.
- Use
fluent-catto inject test logs. - Enable
log_level debugtemporarily for detailed output. - Ensure network connectivity to output destinations (e.g., telnet to port 9200).
Is Fluentd secure by default?
No. Fluentd does not enable encryption or authentication by default. Always:
- Use TLS for network communication.
- Restrict access to input ports with firewalls.
- Use authentication plugins (e.g.,
fluent-plugin-secure-forward) for sensitive environments. - Rotate credentials and avoid hardcoding secrets in config files—use environment variables or secrets management tools.
Conclusion
Configuring Fluentd effectively is a cornerstone of modern observability. Its plugin-driven architecture, flexibility across platforms, and robust buffering mechanisms make it indispensable for organizations managing complex, distributed systems. From collecting logs on a single server to orchestrating global log pipelines across hybrid clouds, Fluentd provides the tools to unify, transform, and deliver log data with precision.
This guide has walked you through every essential step: installation, source and sink configuration, filtering for enrichment and compliance, performance optimization, and real-world deployment patterns. By following best practices—such as using file buffers, tagging logs meaningfully, securing communications, and monitoring metrics—you ensure reliability, scalability, and maintainability.
Remember: Fluentd is not just a log collector; it’s a data pipeline engine. Treat it with the same rigor as your application code. Version control your configurations, test changes in staging, and monitor performance continuously. As your infrastructure evolves, Fluentd will evolve with you—making it a long-term investment in operational excellence.
Start small, validate often, and scale deliberately. With Fluentd properly configured, your logs will no longer be a liability—they’ll become your most valuable asset for insight, resilience, and innovation.