How to Configure Fluentd

How to Configure Fluentd Fluentd is an open-source data collector designed to unify logging solutions across diverse systems, applications, and environments. As modern infrastructure grows increasingly distributed—with microservices, containers, cloud platforms, and hybrid deployments—centralized log management has become a critical component of observability, troubleshooting, and compliance. Flue

Oct 30, 2025 - 12:33
Oct 30, 2025 - 12:33
 0

How to Configure Fluentd

Fluentd is an open-source data collector designed to unify logging solutions across diverse systems, applications, and environments. As modern infrastructure grows increasingly distributed—with microservices, containers, cloud platforms, and hybrid deployments—centralized log management has become a critical component of observability, troubleshooting, and compliance. Fluentd excels in this space by providing a flexible, reliable, and scalable platform for collecting, filtering, and forwarding logs in real time. Whether you're managing a small application stack or a large Kubernetes cluster, configuring Fluentd correctly ensures that your logs are captured efficiently, structured meaningfully, and delivered to the right destinations for analysis.

This guide walks you through the complete process of configuring Fluentd—from installation to advanced routing and optimization. You’ll learn how to tailor Fluentd to your specific use case, implement best practices for performance and reliability, leverage essential tools, and apply real-world configurations that have been battle-tested in production environments. By the end of this tutorial, you’ll have a comprehensive understanding of Fluentd’s architecture and the confidence to deploy it confidently in any environment.

Step-by-Step Guide

1. Understanding Fluentd’s Architecture

Before diving into configuration, it’s essential to understand Fluentd’s core components and how they interact. Fluentd operates on a plugin-based architecture, where each function is handled by a modular plugin. The three primary components are:

  • Sources: Define where logs are collected from (e.g., files, syslog, HTTP endpoints, Docker containers).
  • Filters: Modify, enrich, or transform log records before forwarding (e.g., parsing JSON, adding tags, removing sensitive fields).
  • Sinks: Specify where logs are sent (e.g., Elasticsearch, S3, Kafka, stdout).

Logs flow from source → filter → sink. Each component is configured using a declarative syntax in the Fluentd configuration file, typically named fluentd.conf. The configuration file uses a simple key-value structure with sections enclosed in <source>, <filter>, and <match> tags.

2. Installing Fluentd

Fluentd supports multiple operating systems and deployment models. Below are the most common installation methods.

On Ubuntu/Debian

Install Fluentd using the official package repository:

curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-focal-td-agent4.sh | sh

This installs td-agent, the enterprise version of Fluentd maintained by Treasure Data, which includes precompiled plugins and better stability for production use.

After installation, verify it’s running:

sudo systemctl status td-agent

On CentOS/RHEL

Use the following command to install td-agent:

curl -L https://toolbelt.treasuredata.com/sh/install-redhat-8-td-agent4.sh | sh

Then start and enable the service:

sudo systemctl start td-agent

sudo systemctl enable td-agent

Using Docker

For containerized environments, Fluentd can be run as a sidecar or centralized logging container:

docker run -d --name fluentd -p 24224:24224 -p 24224:24224/udp -v $(pwd)/fluentd.conf:/etc/fluent/fluent.conf fluent/fluentd:latest

Ensure your configuration file (fluentd.conf) is mounted into the container at /etc/fluent/fluent.conf.

From Source (Advanced)

If you need the latest development version or custom plugins, install Fluentd via RubyGems:

gem install fluentd

Then start the service manually:

fluentd -c /path/to/fluentd.conf

Note: This method is not recommended for production due to lack of service management and dependency control.

3. Basic Configuration File Structure

The Fluentd configuration file is written in a domain-specific language (DSL) using <source>, <filter>, and <match> blocks. Here’s a minimal working configuration:

<source>

@type tail

path /var/log/nginx/access.log

pos_file /var/log/td-agent/nginx-access.log.pos

tag nginx.access

format nginx

</source>

<match **>

@type stdout

</match>

Let’s break this down:

  • <source> defines a tail input, reading from Nginx’s access log file.
  • pos_file tracks the last read position to avoid duplicate logs on restart.
  • tag nginx.access assigns a label to the log stream for routing.
  • format nginx uses Fluentd’s built-in parser to extract fields like IP, method, status, and user agent.
  • <match **> matches all tags and sends logs to stdout.

Save this as /etc/td-agent/fluentd.conf (or wherever your config directory is) and restart Fluentd:

sudo systemctl restart td-agent

Check logs for errors:

sudo journalctl -u td-agent -f

4. Configuring Multiple Sources

Most environments require collecting logs from multiple sources. Here’s an example that collects logs from Nginx, system syslog, and a custom application:

<source>

@type tail

path /var/log/nginx/access.log

pos_file /var/log/td-agent/nginx-access.log.pos

tag nginx.access

format nginx

</source>

<source>

@type tail

path /var/log/nginx/error.log

pos_file /var/log/td-agent/nginx-error.log.pos

tag nginx.error

format /^(?

</source>

<source>

@type syslog

port 5140

bind 0.0.0.0

tag system.syslog

protocol_type tcp

</source>

<source>

@type forward

port 24224

bind 0.0.0.0

</source>

This configuration:

  • Reads Nginx access logs with the built-in parser.
  • Uses a custom regex to parse Nginx error logs.
  • Accepts syslog messages over TCP on port 5140.
  • Enables Fluentd’s forward protocol for receiving logs from other Fluentd instances (useful in distributed setups).

5. Applying Filters for Data Enrichment

Raw logs are rarely ready for analysis. Filters allow you to clean, parse, and enhance data before sending it to storage.

JSON Parsing Filter

If your application logs in JSON format:

<filter app.json>

@type parser

key_name log

reserve_data true

reserve_time true

format json

</filter>

This extracts the JSON string from the log field and converts it into structured key-value pairs. reserve_data keeps the original field, and reserve_time preserves the original timestamp.

Adding Metadata with Record Transformer

Enrich logs with environment or host information:

<filter **>

@type record_transformer

enable_ruby true

<record>

hostname ${ENV['HOSTNAME']}

environment production

</record>

</filter>

This adds two fields to every log record: the container or host name and the deployment environment.

Removing Sensitive Data

Comply with data privacy regulations by redacting PII:

<filter **>

@type grep

<exclude>

key message pattern \b\d{3}-\d{2}-\d{4}\b

SSN pattern

</exclude>

<exclude>

key message pattern \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b

Email

</exclude>

</filter>

This removes any log entries containing social security numbers or email addresses.

6. Configuring Output Sinks

Fluentd supports over 600 plugins for output destinations. Here are the most common ones.

Elasticsearch

Install the plugin:

sudo td-agent-gem install fluent-plugin-elasticsearch

Configure the output:

<match nginx.access>

@type elasticsearch

host elasticsearch.example.com

port 9200

logstash_format true

logstash_prefix nginx

logstash_dateformat %Y%m%d

include_tag_key true

tag_key @log_name

flush_interval 10s

</match>

Fluentd will send logs to Elasticsearch with daily indices (e.g., nginx-20240615), making it easier to manage retention and performance.

Amazon S3

Install the plugin:

sudo td-agent-gem install fluent-plugin-s3

Configure for batch archiving:

<match system.syslog>

@type s3

aws_key_id YOUR_AWS_KEY

aws_sec_key YOUR_AWS_SECRET

s3_bucket my-logs-bucket

s3_region us-east-1

path logs/system/

buffer_path /var/log/td-agent/buffer/s3

time_slice_format %Y/%m/%d/%H

time_slice_wait 10m

utc

format json

</match>

This batches logs every 10 minutes and uploads them to S3 in structured JSON format—ideal for compliance and long-term storage.

Kafka

For high-throughput streaming:

sudo td-agent-gem install fluent-plugin-kafka
<match **>

@type kafka_buffered

brokers kafka1:9092,kafka2:9092

default_topic logs

output_data_type json

compression_codec gzip

max_send_retries 3

required_acks -1

buffer_type file

buffer_path /var/log/td-agent/buffer/kafka

flush_interval 5s

</match>

Kafka acts as a durable buffer, decoupling log producers from consumers and providing resilience during downstream outages.

7. Testing and Validating Configuration

Always validate your configuration before restarting Fluentd:

sudo td-agent --dry-run -c /etc/td-agent/fluentd.conf

This checks syntax and plugin availability without starting the service.

To test log ingestion manually, use fluent-cat:

echo '{"message":"test log","level":"info"}' | fluent-cat app.json

If your configuration includes a match for app.json, the log will appear in your output destination.

8. Monitoring Fluentd

Enable Fluentd’s built-in metrics endpoint to monitor performance:

<system>

log_level info

<plugin>

@type prometheus

port 24231

metrics_path /metrics

</plugin>

</system>

Access metrics at http://localhost:24231/metrics to view:

  • Buffer queue sizes
  • Output success/failure rates
  • Memory usage
  • Event throughput

Integrate with Prometheus and Grafana for real-time dashboards.

Best Practices

1. Use td-agent Over Vanilla Fluentd in Production

td-agent is a hardened, packaged version of Fluentd with tested dependencies, automatic updates, and systemd integration. Avoid installing Fluentd via gem in production environments due to potential version conflicts and lack of support.

2. Separate Logs by Tag and Route Accordingly

Use meaningful tags like app.web, db.mysql, infra.network to distinguish log sources. This enables targeted filtering, routing, and retention policies.

3. Always Use pos_file for Tail Sources

Without a pos_file, Fluentd will re-read entire files on restart, causing duplicate logs. Always specify a unique path for each log file.

4. Buffer Logs Locally Before Remote Output

Network interruptions are inevitable. Use file-based buffers with appropriate flush intervals to avoid data loss:

buffer_type file

buffer_path /var/log/td-agent/buffer/nginx

flush_interval 10s

flush_thread_count 2

retry_max_times 10

retry_wait 10s

This ensures logs are stored locally during outages and retried automatically.

5. Avoid Heavy Processing in Filters

Complex Ruby expressions or large regex patterns can slow down log ingestion. Use built-in parsers (e.g., json, nginx, syslog) instead of custom regex when possible.

6. Secure Communication

When sending logs over the network, use TLS:

  • Enable TLS in Elasticsearch output with ssl_verify false (only if using self-signed certs) or ssl_verify true with CA bundle.
  • Use TLS for forward and syslog inputs.
  • Restrict access to Fluentd ports using firewalls or network policies.

7. Limit Log Volume with Sampling

For high-volume applications, consider sampling logs to reduce cost and storage:

<filter app.highvolume>

@type sampler

rate 10

</filter>

This forwards only 1 in 10 log events, reducing load while preserving statistical relevance.

8. Implement Log Rotation

Ensure your log files are rotated regularly (using logrotate) and that Fluentd’s pos_file is updated correctly. Use refresh_interval in tail sources to detect rotated files:

refresh_interval 60s

9. Version Control Your Configuration

Treat Fluentd configuration as code. Store it in Git, apply CI/CD practices, and deploy via configuration management tools like Ansible, Puppet, or Terraform.

10. Regularly Audit and Update Plugins

Keep Fluentd and its plugins updated to benefit from security patches and performance improvements. Use td-agent-gem list to check versions.

Tools and Resources

Official Documentation

The most authoritative resource is the Fluentd Documentation. It includes plugin references, configuration examples, and architecture diagrams.

Fluentd Plugin Registry

Explore all available plugins at https://www.fluentd.org/plugins/all. Filter by category (input, filter, output) and check community ratings and update frequency.

Fluent Bit (Lightweight Alternative)

For resource-constrained environments (e.g., edge devices, IoT), consider Fluent Bit—a faster, lower-memory cousin of Fluentd. It shares similar syntax and can forward to the same destinations.

Containerized Deployments

Use Helm charts for Kubernetes:

Monitoring Tools

  • Prometheus + Grafana: For visualizing Fluentd metrics.
  • Elastic Stack (ELK): For centralized log search and dashboards.
  • Datadog: Offers native Fluentd integration with pre-built monitors.
  • Logstash: Can be used alongside Fluentd for complex transformations, though Fluentd is generally preferred for ingestion.

Debugging Tools

  • fluent-cat: Inject test logs for validation.
  • journalctl -u td-agent: View Fluentd service logs.
  • tail -f /var/log/td-agent/td-agent.log: Monitor Fluentd’s internal logs.
  • netstat -tlnp | grep 24224: Verify Fluentd is listening on expected ports.

Community and Support

Join the Fluentd GitHub repository to report bugs, request features, or contribute plugins. The community is active and responsive.

Real Examples

Example 1: Kubernetes Cluster Logging

In a Kubernetes environment, Fluentd runs as a DaemonSet on each node to collect container logs from /var/log/containers/.

<source>

@type tail

path /var/log/containers/*.log

pos_file /var/log/fluentd-containers.log.pos

tag kubernetes.*

read_from_head true

<parse>

@type json

time_key time

time_format %Y-%m-%dT%H:%M:%S.%NZ

</parse>

</source>

<filter kubernetes.**>

@type kubernetes_metadata

</filter>

<match kubernetes.**>

@type elasticsearch

host elasticsearch.logging.svc.cluster.local

port 9200

logstash_format true

logstash_prefix k8s-logs

include_tag_key true

flush_interval 5s

<buffer>

@type file

path /var/log/fluentd-buffers/kubernetes.system.buffer

flush_mode interval

retry_type exponential_backoff

flush_thread_count 2

flush_interval 5s

retry_max_times 10

chunk_limit_size 2M

queue_limit_length 8

overflow_action block

</buffer>

</match>

This configuration:

  • Reads all container logs in JSON format.
  • Uses the kubernetes_metadata plugin to enrich logs with pod, namespace, and container metadata.
  • Sends logs to an Elasticsearch cluster within the same Kubernetes namespace.
  • Uses buffered output with fail-safe behavior to prevent data loss during Elasticsearch downtime.

Example 2: Multi-Tenant Application Logging

A SaaS platform needs to separate logs by customer ID for compliance and billing purposes.

<source>

@type forward

port 24224

bind 0.0.0.0

</source>

<filter app.**>

@type record_transformer

enable_ruby true

<record>

customer_id ${record['customer_id'] || 'unknown'}

</record>

</filter>

<match app.**>

@type rewrite_tag_filter

<rule>

key customer_id

pattern ^(.+)$

tag customer.${customer_id}

</rule>

</match>

<match customer.*>

@type s3

aws_key_id YOUR_KEY

aws_sec_key YOUR_SECRET

s3_bucket your-logs-bucket

s3_region us-east-1

path logs/customer/${tag_parts[1]}/

time_slice_format %Y/%m/%d/%H

time_slice_wait 5m

utc

format json

</match>

This routes logs to separate S3 folders per customer (e.g., logs/customer/acme-inc/), enabling fine-grained access control and audit trails.

Example 3: Hybrid On-Premises and Cloud Logging

A company has on-premises servers and AWS EC2 instances. Both send logs to a central Fluentd aggregator in AWS.

On-premises Fluentd (forwarder):

<source>

@type tail

path /var/log/app.log

tag app.prod

format json

</source>

<match app.prod>

@type forward

<server>

host fluentd-aggregator.aws.example.com

port 24224

</server>

<buffer>

@type file

path /var/log/td-agent/buffer/forward

flush_interval 10s

retry_max_times 15

</buffer>

</match>

Cloud Fluentd (aggregator):

<source>

@type forward

port 24224

bind 0.0.0.0

</source>

<match app.prod>

@type s3

aws_key_id YOUR_AWS_KEY

aws_sec_key YOUR_AWS_SECRET

s3_bucket company-logs

path logs/onprem/app/

time_slice_format %Y/%m/%d/%H

time_slice_wait 10m

utc

</match>

This design ensures logs survive network outages and are stored durably in the cloud.

FAQs

What is the difference between Fluentd and Fluent Bit?

Fluentd is a full-featured, Ruby-based log collector with extensive plugin support and rich filtering capabilities. It’s ideal for complex environments requiring deep log transformation. Fluent Bit is a lightweight, Go-based alternative designed for speed and low memory usage—perfect for containers, edge devices, and Kubernetes nodes. Fluent Bit can forward logs to Fluentd for advanced processing.

How do I handle log duplication in Fluentd?

Log duplication typically occurs when:

  • Multiple Fluentd instances read the same log file.
  • pos_file is missing or shared between instances.
  • Logs are forwarded multiple times through overlapping match rules.

Solutions: Use unique pos_file paths per source, avoid overlapping tags, and use unique_id in forward outputs to prevent circular forwarding.

Can Fluentd parse non-JSON logs like Apache or custom formats?

Yes. Fluentd supports regex parsing via the parser filter. For example, Apache Common Log Format:

<source>

@type tail

path /var/log/apache2/access.log

tag apache.access

<parse>

@type regexp expression /^(?[^ ]*) [^ ]* (?[^ ]*) \[(?

time_format %d/%b/%Y:%H:%M:%S %z

</parse>

</source>

How do I reduce Fluentd’s memory usage?

Optimize by:

  • Using Fluent Bit for ingestion and forwarding to Fluentd for processing.
  • Reducing buffer chunk sizes (chunk_limit_size).
  • Limiting the number of concurrent flush threads (flush_thread_count).
  • Disabling unnecessary plugins.
  • Using file buffers instead of memory buffers where possible.

Does Fluentd support log retention and rotation?

Fluentd itself does not manage log retention. It forwards logs to destinations that do—such as Elasticsearch (with ILM), S3 (with lifecycle policies), or Kafka (with topic retention settings). Configure retention at the sink level.

How do I troubleshoot a Fluentd configuration that isn’t working?

Follow this checklist:

  1. Run td-agent --dry-run to validate syntax.
  2. Check journalctl -u td-agent for startup errors.
  3. Verify file permissions on log files and pos_file directories.
  4. Use fluent-cat to inject test logs.
  5. Enable log_level debug temporarily for detailed output.
  6. Ensure network connectivity to output destinations (e.g., telnet to port 9200).

Is Fluentd secure by default?

No. Fluentd does not enable encryption or authentication by default. Always:

  • Use TLS for network communication.
  • Restrict access to input ports with firewalls.
  • Use authentication plugins (e.g., fluent-plugin-secure-forward) for sensitive environments.
  • Rotate credentials and avoid hardcoding secrets in config files—use environment variables or secrets management tools.

Conclusion

Configuring Fluentd effectively is a cornerstone of modern observability. Its plugin-driven architecture, flexibility across platforms, and robust buffering mechanisms make it indispensable for organizations managing complex, distributed systems. From collecting logs on a single server to orchestrating global log pipelines across hybrid clouds, Fluentd provides the tools to unify, transform, and deliver log data with precision.

This guide has walked you through every essential step: installation, source and sink configuration, filtering for enrichment and compliance, performance optimization, and real-world deployment patterns. By following best practices—such as using file buffers, tagging logs meaningfully, securing communications, and monitoring metrics—you ensure reliability, scalability, and maintainability.

Remember: Fluentd is not just a log collector; it’s a data pipeline engine. Treat it with the same rigor as your application code. Version control your configurations, test changes in staging, and monitor performance continuously. As your infrastructure evolves, Fluentd will evolve with you—making it a long-term investment in operational excellence.

Start small, validate often, and scale deliberately. With Fluentd properly configured, your logs will no longer be a liability—they’ll become your most valuable asset for insight, resilience, and innovation.