How to Setup Prometheus
How to Setup Prometheus Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud in 2012. Since its inception, it has become one of the most widely adopted monitoring solutions in the cloud-native ecosystem, particularly within Kubernetes environments. Its powerful query language (PromQL), flexible data model, and robust alerting capabilities make it indi
How to Setup Prometheus
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud in 2012. Since its inception, it has become one of the most widely adopted monitoring solutions in the cloud-native ecosystem, particularly within Kubernetes environments. Its powerful query language (PromQL), flexible data model, and robust alerting capabilities make it indispensable for DevOps teams aiming to maintain system reliability, performance, and observability.
Setting up Prometheus correctly is foundational to building a reliable monitoring infrastructure. Unlike traditional monitoring tools that rely on push-based metrics collection, Prometheus employs a pull-based model, scraping metrics from configured targets at regular intervals. This design promotes scalability, reduces dependency on agent-based instrumentation, and simplifies the management of dynamic environments such as microservices and containers.
In this comprehensive guide, youll learn how to set up Prometheus from scratchwhether youre deploying it on a single server, within a Docker container, or across a production Kubernetes cluster. Well walk through configuration, service integration, best practices, real-world examples, and troubleshooting tips to ensure your Prometheus deployment is secure, efficient, and production-ready.
Step-by-Step Guide
Prerequisites
Before beginning the setup process, ensure you have the following prerequisites in place:
- A Linux-based operating system (Ubuntu 20.04/22.04, CentOS 8+, or similar)
- Administrative (sudo) access to the server
- Basic familiarity with the command line and text editors (e.g., nano, vim)
- Network connectivity to allow HTTP traffic on port 9090 (default Prometheus port)
- Docker and Docker Compose (optional, for containerized deployment)
If you're deploying Prometheus in a Kubernetes environment, ensure you have a working cluster (v1.20+) and kubectl configured.
Step 1: Download Prometheus
Prometheus releases are available as pre-compiled binaries from the official GitHub repository. Navigate to the Prometheus Releases page and select the latest stable version (e.g., v2.51.0 as of 2024).
Use wget or curl to download the binary directly to your server:
wget https://github.com/prometheus/prometheus/releases/download/v2.51.0/prometheus-2.51.0.linux-amd64.tar.gz
Extract the archive:
tar xvfz prometheus-2.51.0.linux-amd64.tar.gz
Move into the extracted directory:
cd prometheus-2.51.0.linux-amd64
Youll see two key files: prometheus (the main binary) and prometheus.yml (the default configuration file). Keep these handywell modify the configuration next.
Step 2: Create a Prometheus User and Directory Structure
For security and organizational purposes, avoid running Prometheus as root. Create a dedicated system user and directory structure:
sudo useradd --no-create-home --shell /bin/false prometheus
Create the necessary directories:
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
Move the Prometheus binary and configuration file to their appropriate locations:
sudo mv prometheus /usr/local/bin/
sudo mv promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
Copy the configuration and console templates:
sudo mv prometheus.yml /etc/prometheus/
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
sudo mkdir /etc/prometheus/console_templates
sudo mv console_templates/* /etc/prometheus/console_templates/
sudo chown -R prometheus:prometheus /etc/prometheus/console_templates
Step 3: Configure Prometheus
The core of Prometheus lies in its configuration file: /etc/prometheus/prometheus.yml. This YAML file defines the targets Prometheus will scrape, how often, and what metadata to attach.
Open the file in your preferred editor:
sudo nano /etc/prometheus/prometheus.yml
By default, it contains a basic configuration that scrapes Prometheus itself. Heres a more comprehensive example suitable for production:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'prometheus-production'
rule_files:
- "alert_rules.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
static_configs:
- targets: ['192.168.1.10:9100', '192.168.1.11:9100']
- job_name: 'blackbox_http'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://example.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115
- job_name: 'cadvisor'
static_configs:
- targets: ['192.168.1.20:8080']
Lets break down the key components:
- global.scrape_interval: How often Prometheus pulls metrics (15 seconds is standard).
- global.evaluation_interval: How often alerting and recording rules are evaluated.
- external_labels: Labels added to all metrics, useful for multi-cluster or multi-environment setups.
- rule_files: Points to external alerting rules (well create this next).
- scrape_configs: Defines jobs (groups of targets) and their scraping behavior.
For the node_exporter job, youll need to install the Node Exporter on each target machine. Well cover that in a later section.
Step 4: Create Alerting Rules
Prometheus supports alerting through rule files. Create a new file:
sudo nano /etc/prometheus/alert_rules.yml
Add basic alert rules:
groups:
- name: instance-alerts
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} is down"
description: "Instance {{ $labels.instance }} has been down for more than 5 minutes."
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 3m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage has been above 85% for the last 3 minutes."
These rules trigger alerts when a target is unreachable (up == 0) or when CPU usage exceeds 85% for more than 3 minutes. The for clause ensures alerts are only fired after a sustained condition, reducing false positives.
Step 5: Set Up a Systemd Service
To ensure Prometheus runs as a background service and restarts on boot, create a systemd unit file:
sudo nano /etc/systemd/system/prometheus.service
Insert the following content:
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console-template=/etc/prometheus/console_templates \
--web.console.templates=/etc/prometheus/console_templates \
--web.listen-address=0.0.0.0:9090 \
--web.enable-admin-api \
--web.enable-lifecycle
Restart=always
[Install]
WantedBy=multi-user.target
Reload systemd to recognize the new service:
sudo systemctl daemon-reload
Start and enable Prometheus:
sudo systemctl start prometheus
sudo systemctl enable prometheus
Check the status to confirm its running:
sudo systemctl status prometheus
Step 6: Verify Prometheus is Running
Open your web browser and navigate to http://your-server-ip:9090. You should see the Prometheus web interface.
Click on Status ? Targets. You should see your configured jobs (prometheus, node_exporter, etc.) with a status of UP. If any targets are DOWN, verify network connectivity and the target service is running.
To test the query interface, go to the Graph tab and enter:
up
This returns a time series of all targets and whether theyre reachable (1 = UP, 0 = DOWN). You should see a value of 1 for each target youve configured.
Step 7: Install Node Exporter (Optional but Recommended)
Node Exporter exposes hardware and OS metrics (CPU, memory, disk, network) in a format Prometheus can scrape. Install it on each machine you wish to monitor.
Download the latest Node Exporter binary:
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
Extract and install:
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
cd node_exporter-1.7.0.linux-amd64
sudo mv node_exporter /usr/local/bin/
sudo chown root:root /usr/local/bin/node_exporter
Create a systemd service for Node Exporter:
sudo nano /etc/systemd/system/node_exporter.service
Add the following:
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=nodeexporter
Group=nodeexporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
Restart=always
[Install]
WantedBy=multi-user.target
Create the user and enable the service:
sudo useradd --no-create-home --shell /bin/false nodeexporter
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter
Verify its running on port 9100:
curl http://localhost:9100/metrics
You should see a long list of metrics in plain text format.
Step 8: Set Up Blackbox Exporter for HTTP/HTTPS Monitoring
Blackbox Exporter allows Prometheus to probe endpoints over HTTP, HTTPS, DNS, TCP, and ICMP. Its ideal for monitoring external services like APIs or websites.
Download and install:
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.24.0/blackbox_exporter-0.24.0.linux-amd64.tar.gz
tar xvfz blackbox_exporter-0.24.0.linux-amd64.tar.gz
cd blackbox_exporter-0.24.0.linux-amd64
sudo mv blackbox_exporter /usr/local/bin/
sudo chown root:root /usr/local/bin/blackbox_exporter
Copy the default configuration:
sudo mkdir /etc/blackbox_exporter
sudo cp blackbox.yml /etc/blackbox_exporter/
Modify /etc/blackbox_exporter/blackbox.yml to include your desired modules:
modules:
http_2xx:
prober: http
timeout: 5s
http:
valid_status_codes: [200, 301, 302]
method: GET
Create a systemd service:
sudo nano /etc/systemd/system/blackbox_exporter.service
Insert:
[Unit]
Description=Blackbox Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/blackbox_exporter --config.file=/etc/blackbox_exporter/blackbox.yml
Restart=always
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl daemon-reload
sudo systemctl start blackbox_exporter
sudo systemctl enable blackbox_exporter
Blackbox Exporter runs on port 9115 by default. Prometheus will scrape http://localhost:9115/probe?target=https://example.com&module=http_2xx to check website availability.
Step 9: Install and Configure Grafana for Visualization
While Prometheus provides a basic UI, Grafana offers rich dashboards, alerting, and multi-source visualization. Install Grafana:
sudo apt-get install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install -y grafana
Start and enable Grafana:
sudo systemctl daemon-reload
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
Access Grafana at http://your-server-ip:3000. Default login: admin/admin.
Add Prometheus as a data source:
- Click Add data source
- Select Prometheus
- Set URL to
http://localhost:9090 - Click Save & Test
Import a pre-built dashboard: Go to Dashboard ? Import and enter ID 1860 (Node Exporter Full) to visualize server metrics.
Best Practices
Use Labels Consistently
Labels are key-value pairs attached to metrics. Use them to identify environment (prod/staging), service name, region, or instance type. Avoid using high-cardinality labels (e.g., user IDs, session tokens) as they can explode the metric space and degrade performance.
Set Appropriate Scrape Intervals
While 15s is standard, adjust based on your needs. For critical services, 5s may be appropriate. For low-frequency metrics (e.g., batch jobs), 1m or longer is acceptable. Never set intervals below 1s unless absolutely necessary.
Separate Alerting and Recording Rules
Keep alerting rules in one file and recording rules (precomputed aggregations) in another. This improves readability and reduces evaluation overhead.
Enable Remote Write for Long-Term Storage
Prometheus stores data locally in its TSDB (Time Series Database). For long-term retention, use remote write to send data to Thanos, Cortex, or VictoriaMetrics. This also enables high availability and horizontal scaling.
Use Service Discovery for Dynamic Environments
Static configurations work for fixed servers. In Kubernetes or cloud environments, use service discovery mechanisms like Kubernetes SD, Consul, or AWS EC2 SD to automatically detect and scrape targets.
Monitor Prometheus Itself
Always monitor Prometheuss own metrics: scrape duration, target health, TSDB size, and query latency. Use the prometheus_target_scrape_duration_seconds and prometheus_local_storage_ingested_samples_total metrics to detect performance degradation.
Secure Your Deployment
By default, Prometheus exposes an admin API and UI on port 9090. In production:
- Place Prometheus behind a reverse proxy (Nginx, Traefik) with TLS termination.
- Enable basic authentication or integrate with OAuth2.
- Restrict access via firewall rules (only allow internal networks or specific IPs).
- Disable the admin API if not needed:
--web.enable-admin-api=false.
Plan for Storage Capacity
Prometheus stores every metric sample. A single node exporter generates ~100200 metrics per second. At 15s intervals, thats 48 samples per minute per target. Multiply by hundreds of targets and youll need 100GB1TB+ of SSD storage per month. Use retention policies:
--storage.tsdb.retention.time=30d
Set this in your systemd service to limit data to 30 days unless youre using remote storage.
Use Alertmanager for Notification Routing
Prometheus alone can trigger alerts but lacks routing, grouping, and deduplication. Integrate with Alertmanager to send notifications via email, Slack, PagerDuty, or Microsoft Teams. Configure it in prometheus.yml under alerting.alertmanagers.
Tools and Resources
Essential Exporters
Exporters are small services that expose metrics in Prometheus format. Key ones include:
- Node Exporter Server hardware and OS metrics.
- Blackbox Exporter HTTP, TCP, ICMP probes.
- Cadvisor Container resource usage (used with Docker/Kubernetes).
- PostgreSQL Exporter Database metrics (queries, connections, replication).
- MySQL Exporter MySQL performance metrics.
- Redis Exporter Redis memory, connections, latency.
- Pushgateway For batch jobs and ephemeral tasks that cant be scraped.
All exporters are available on GitHub under the Prometheus organization: github.com/prometheus.
Monitoring Stack Components
For a full observability stack, combine Prometheus with:
- Grafana Dashboarding and visualization.
- Alertmanager Alert routing and deduplication.
- Thanos Long-term storage, global querying, and high availability.
- VictoriaMetrics Scalable, drop-in Prometheus replacement with remote storage.
- loki Log aggregation (complements metrics with logs).
- jaeger Distributed tracing (for latency analysis across microservices).
Official Documentation and Learning Resources
- Prometheus Official Documentation
- PromQL Query Language Guide
- Configuration Reference
- Grafana Prometheus Tutorial
- Prometheus Crash Course (YouTube)
Community and Support
The Prometheus community is active and helpful:
- Slack: Join the CNCF Slack workspace and visit
prometheus
- Forum: discuss.prometheus.io
- GitHub Issues: Report bugs or request features
Real Examples
Example 1: Monitoring a Web Application Stack
Consider a simple stack: Nginx ? Node.js API ? PostgreSQL ? Redis.
- Use Node Exporter on the server to monitor CPU, memory, disk.
- Use nginx-exporter to collect Nginx request rates, status codes, and connections.
- Use nodejs-exporter (via the prom-client library) to expose custom app metrics like request latency and error rates.
- Use postgres-exporter to monitor query execution time and connection pool usage.
- Use redis-exporter to track memory fragmentation and eviction rates.
Alerting rules:
- Trigger alert if PostgreSQL connection pool is >90% full.
- Alert if Node.js request latency exceeds 2s for 5 minutes.
- Trigger if Redis memory usage >95%.
Dashboard: Grafana with panels showing request throughput, error rate, database load, and system resource usage.
Example 2: Kubernetes Cluster Monitoring
In Kubernetes, deploy Prometheus using the Prometheus Helm Chart:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack
This installs:
- Prometheus Server
- Alertmanager
- Node Exporter (DaemonSet)
- Kube State Metrics
- Grafana
- Preconfigured dashboards
Metrics collected:
- Pod CPU/Memory usage
- Node resource pressure
- Deployment replica status
- Network policy violations
- API server latency
Alerts include:
- KubePodCrashLooping
- KubeDeploymentReplicasMismatch
- KubeNodeNotReady
Example 3: Monitoring a CI/CD Pipeline
Use the Pushgateway to collect metrics from Jenkins or GitHub Actions jobs:
In your CI script:
Capture build duration
BUILD_DURATION=$(date +%s)
... build logic ...
BUILD_DURATION=$(( $(date +%s) - BUILD_DURATION ))
Push to Pushgateway
curl -X POST -H "Content-Type: text/plain" --data "build_duration $BUILD_DURATION" http://pushgateway:9091/metrics/job/ci_build/branch/main
Prometheus scrapes the Pushgateway every 15s and includes the job and branch as labels.
Alert: Trigger if average build duration increases by 50% over 24 hours.
FAQs
What is the difference between Prometheus and Grafana?
Prometheus is a time-series database and monitoring system that collects and stores metrics. Grafana is a visualization tool that connects to Prometheus (and other data sources) to create dashboards and alerts. They are complementary: Prometheus gathers data; Grafana displays it.
Can Prometheus monitor Windows servers?
Yes, using the wmi_exporter. Install it on Windows machines to expose metrics like disk usage, network interfaces, and Windows service status. Configuration is similar to Node Exporter.
How much memory does Prometheus need?
Memory usage scales with the number of active time series. For 10,000 time series, expect 12GB RAM. For 100,000+, allocate 816GB. Use the prometheus_tsdb_head_series metric to monitor active series count.
Does Prometheus support log collection?
No. Prometheus is designed for metrics, not logs. For logs, use Loki (from Grafana Labs), Fluentd, or ELK stack. You can correlate logs and metrics using shared labels in Grafana.
How do I backup Prometheus data?
Prometheus stores data in /var/lib/prometheus. To backup, stop the service and copy the directory:
sudo systemctl stop prometheus
sudo tar -czf prometheus-backup.tar.gz /var/lib/prometheus
sudo systemctl start prometheus
For production, use remote write to a long-term storage system like Thanos or VictoriaMetrics.
Why are my targets showing as DOWN?
Common causes:
- Network firewall blocking port 9090/9100
- Target service not running
- Incorrect IP or port in config
- SSL/TLS certificate errors (for HTTPS targets)
- Authentication required but not configured
Check the Targets page in Prometheus UI for detailed error messages.
Can I run Prometheus in Docker?
Yes. Use the official image:
docker run -d \
--name=prometheus \
-p 9090:9090 \
-v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus
For Docker Compose, define the service in docker-compose.yml with volumes and ports.
What is PromQL?
PromQL (Prometheus Query Language) is a functional query language used to select and aggregate time series data. Examples:
http_requests_total{job="api-server"}All HTTP requests for the API server job.rate(http_requests_total[5m])Requests per second over the last 5 minutes.sum by(instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m]))CPU usage per instance.
Conclusion
Setting up Prometheus is a critical step toward achieving true observability in modern infrastructure. From monitoring bare-metal servers to Kubernetes clusters and cloud-native applications, Prometheus provides the flexibility, scalability, and depth needed to understand system behavior in real time.
This guide has walked you through the complete processfrom downloading binaries and configuring scrape targets, to securing the deployment and integrating with Grafana and Alertmanager. Youve seen real-world examples of monitoring web stacks, CI/CD pipelines, and containerized environments.
Remember: Prometheus is not a magic bullet. Its power lies in thoughtful configuration, consistent labeling, and integration with complementary tools. Avoid the trap of collecting everythingfocus on the metrics that matter most to your service level objectives (SLOs).
As your infrastructure grows, consider migrating to distributed solutions like Thanos or VictoriaMetrics for long-term storage and high availability. But for now, with this setup, you have a robust, production-ready monitoring foundation that will serve you well for years to come.
Start small. Monitor whats critical. Iterate based on real incidents. And let Prometheus be your eyes in the infrastructureso youre never blind to whats happening under the hood.