How to Setup Prometheus

How to Setup Prometheus Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud in 2012. Since its inception, it has become one of the most widely adopted monitoring solutions in the cloud-native ecosystem, particularly within Kubernetes environments. Its powerful query language (PromQL), flexible data model, and robust alerting capabilities make it indi

Oct 30, 2025 - 12:27
Oct 30, 2025 - 12:27
 1

How to Setup Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud in 2012. Since its inception, it has become one of the most widely adopted monitoring solutions in the cloud-native ecosystem, particularly within Kubernetes environments. Its powerful query language (PromQL), flexible data model, and robust alerting capabilities make it indispensable for DevOps teams aiming to maintain system reliability, performance, and observability.

Setting up Prometheus correctly is foundational to building a reliable monitoring infrastructure. Unlike traditional monitoring tools that rely on push-based metrics collection, Prometheus employs a pull-based model, scraping metrics from configured targets at regular intervals. This design promotes scalability, reduces dependency on agent-based instrumentation, and simplifies the management of dynamic environments such as microservices and containers.

In this comprehensive guide, youll learn how to set up Prometheus from scratchwhether youre deploying it on a single server, within a Docker container, or across a production Kubernetes cluster. Well walk through configuration, service integration, best practices, real-world examples, and troubleshooting tips to ensure your Prometheus deployment is secure, efficient, and production-ready.

Step-by-Step Guide

Prerequisites

Before beginning the setup process, ensure you have the following prerequisites in place:

  • A Linux-based operating system (Ubuntu 20.04/22.04, CentOS 8+, or similar)
  • Administrative (sudo) access to the server
  • Basic familiarity with the command line and text editors (e.g., nano, vim)
  • Network connectivity to allow HTTP traffic on port 9090 (default Prometheus port)
  • Docker and Docker Compose (optional, for containerized deployment)

If you're deploying Prometheus in a Kubernetes environment, ensure you have a working cluster (v1.20+) and kubectl configured.

Step 1: Download Prometheus

Prometheus releases are available as pre-compiled binaries from the official GitHub repository. Navigate to the Prometheus Releases page and select the latest stable version (e.g., v2.51.0 as of 2024).

Use wget or curl to download the binary directly to your server:

wget https://github.com/prometheus/prometheus/releases/download/v2.51.0/prometheus-2.51.0.linux-amd64.tar.gz

Extract the archive:

tar xvfz prometheus-2.51.0.linux-amd64.tar.gz

Move into the extracted directory:

cd prometheus-2.51.0.linux-amd64

Youll see two key files: prometheus (the main binary) and prometheus.yml (the default configuration file). Keep these handywell modify the configuration next.

Step 2: Create a Prometheus User and Directory Structure

For security and organizational purposes, avoid running Prometheus as root. Create a dedicated system user and directory structure:

sudo useradd --no-create-home --shell /bin/false prometheus

Create the necessary directories:

sudo mkdir /etc/prometheus

sudo mkdir /var/lib/prometheus

Move the Prometheus binary and configuration file to their appropriate locations:

sudo mv prometheus /usr/local/bin/

sudo mv promtool /usr/local/bin/

sudo chown prometheus:prometheus /usr/local/bin/prometheus

sudo chown prometheus:prometheus /usr/local/bin/promtool

Copy the configuration and console templates:

sudo mv prometheus.yml /etc/prometheus/

sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml

sudo mkdir /etc/prometheus/console_templates

sudo mv console_templates/* /etc/prometheus/console_templates/

sudo chown -R prometheus:prometheus /etc/prometheus/console_templates

Step 3: Configure Prometheus

The core of Prometheus lies in its configuration file: /etc/prometheus/prometheus.yml. This YAML file defines the targets Prometheus will scrape, how often, and what metadata to attach.

Open the file in your preferred editor:

sudo nano /etc/prometheus/prometheus.yml

By default, it contains a basic configuration that scrapes Prometheus itself. Heres a more comprehensive example suitable for production:

global:

scrape_interval: 15s

evaluation_interval: 15s

external_labels:

monitor: 'prometheus-production'

rule_files:

- "alert_rules.yml"

scrape_configs:

- job_name: 'prometheus'

static_configs:

- targets: ['localhost:9090']

- job_name: 'node_exporter'

static_configs:

- targets: ['192.168.1.10:9100', '192.168.1.11:9100']

- job_name: 'blackbox_http'

metrics_path: /probe

params:

module: [http_2xx]

static_configs:

- targets:

- https://example.com

relabel_configs:

- source_labels: [__address__]

target_label: __param_target

- source_labels: [__param_target]

target_label: instance

- target_label: __address__

replacement: 127.0.0.1:9115

- job_name: 'cadvisor'

static_configs:

- targets: ['192.168.1.20:8080']

Lets break down the key components:

  • global.scrape_interval: How often Prometheus pulls metrics (15 seconds is standard).
  • global.evaluation_interval: How often alerting and recording rules are evaluated.
  • external_labels: Labels added to all metrics, useful for multi-cluster or multi-environment setups.
  • rule_files: Points to external alerting rules (well create this next).
  • scrape_configs: Defines jobs (groups of targets) and their scraping behavior.

For the node_exporter job, youll need to install the Node Exporter on each target machine. Well cover that in a later section.

Step 4: Create Alerting Rules

Prometheus supports alerting through rule files. Create a new file:

sudo nano /etc/prometheus/alert_rules.yml

Add basic alert rules:

groups:

- name: instance-alerts

rules:

- alert: InstanceDown

expr: up == 0

for: 5m

labels:

severity: critical

annotations:

summary: "Instance {{ $labels.instance }} is down"

description: "Instance {{ $labels.instance }} has been down for more than 5 minutes."

- alert: HighCPUUsage

expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85

for: 3m

labels:

severity: warning

annotations:

summary: "High CPU usage on {{ $labels.instance }}"

description: "CPU usage has been above 85% for the last 3 minutes."

These rules trigger alerts when a target is unreachable (up == 0) or when CPU usage exceeds 85% for more than 3 minutes. The for clause ensures alerts are only fired after a sustained condition, reducing false positives.

Step 5: Set Up a Systemd Service

To ensure Prometheus runs as a background service and restarts on boot, create a systemd unit file:

sudo nano /etc/systemd/system/prometheus.service

Insert the following content:

[Unit]

Description=Prometheus

Wants=network-online.target

After=network-online.target

[Service]

User=prometheus

Group=prometheus

Type=simple

ExecStart=/usr/local/bin/prometheus \

--config.file /etc/prometheus/prometheus.yml \

--storage.tsdb.path /var/lib/prometheus/ \

--web.console-template=/etc/prometheus/console_templates \

--web.console.templates=/etc/prometheus/console_templates \

--web.listen-address=0.0.0.0:9090 \

--web.enable-admin-api \

--web.enable-lifecycle

Restart=always

[Install]

WantedBy=multi-user.target

Reload systemd to recognize the new service:

sudo systemctl daemon-reload

Start and enable Prometheus:

sudo systemctl start prometheus

sudo systemctl enable prometheus

Check the status to confirm its running:

sudo systemctl status prometheus

Step 6: Verify Prometheus is Running

Open your web browser and navigate to http://your-server-ip:9090. You should see the Prometheus web interface.

Click on Status ? Targets. You should see your configured jobs (prometheus, node_exporter, etc.) with a status of UP. If any targets are DOWN, verify network connectivity and the target service is running.

To test the query interface, go to the Graph tab and enter:

up

This returns a time series of all targets and whether theyre reachable (1 = UP, 0 = DOWN). You should see a value of 1 for each target youve configured.

Step 7: Install Node Exporter (Optional but Recommended)

Node Exporter exposes hardware and OS metrics (CPU, memory, disk, network) in a format Prometheus can scrape. Install it on each machine you wish to monitor.

Download the latest Node Exporter binary:

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz

Extract and install:

tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz

cd node_exporter-1.7.0.linux-amd64

sudo mv node_exporter /usr/local/bin/

sudo chown root:root /usr/local/bin/node_exporter

Create a systemd service for Node Exporter:

sudo nano /etc/systemd/system/node_exporter.service

Add the following:

[Unit]

Description=Node Exporter

Wants=network-online.target

After=network-online.target

[Service]

User=nodeexporter

Group=nodeexporter

Type=simple

ExecStart=/usr/local/bin/node_exporter

Restart=always

[Install]

WantedBy=multi-user.target

Create the user and enable the service:

sudo useradd --no-create-home --shell /bin/false nodeexporter

sudo systemctl daemon-reload

sudo systemctl start node_exporter

sudo systemctl enable node_exporter

Verify its running on port 9100:

curl http://localhost:9100/metrics

You should see a long list of metrics in plain text format.

Step 8: Set Up Blackbox Exporter for HTTP/HTTPS Monitoring

Blackbox Exporter allows Prometheus to probe endpoints over HTTP, HTTPS, DNS, TCP, and ICMP. Its ideal for monitoring external services like APIs or websites.

Download and install:

wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.24.0/blackbox_exporter-0.24.0.linux-amd64.tar.gz

tar xvfz blackbox_exporter-0.24.0.linux-amd64.tar.gz

cd blackbox_exporter-0.24.0.linux-amd64

sudo mv blackbox_exporter /usr/local/bin/

sudo chown root:root /usr/local/bin/blackbox_exporter

Copy the default configuration:

sudo mkdir /etc/blackbox_exporter

sudo cp blackbox.yml /etc/blackbox_exporter/

Modify /etc/blackbox_exporter/blackbox.yml to include your desired modules:

modules:

http_2xx:

prober: http

timeout: 5s

http:

valid_status_codes: [200, 301, 302]

method: GET

Create a systemd service:

sudo nano /etc/systemd/system/blackbox_exporter.service

Insert:

[Unit]

Description=Blackbox Exporter

Wants=network-online.target

After=network-online.target

[Service]

User=root

Group=root

Type=simple

ExecStart=/usr/local/bin/blackbox_exporter --config.file=/etc/blackbox_exporter/blackbox.yml

Restart=always

[Install]

WantedBy=multi-user.target

Enable and start:

sudo systemctl daemon-reload

sudo systemctl start blackbox_exporter

sudo systemctl enable blackbox_exporter

Blackbox Exporter runs on port 9115 by default. Prometheus will scrape http://localhost:9115/probe?target=https://example.com&module=http_2xx to check website availability.

Step 9: Install and Configure Grafana for Visualization

While Prometheus provides a basic UI, Grafana offers rich dashboards, alerting, and multi-source visualization. Install Grafana:

sudo apt-get install -y apt-transport-https software-properties-common wget

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -

echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list

sudo apt-get update

sudo apt-get install -y grafana

Start and enable Grafana:

sudo systemctl daemon-reload

sudo systemctl start grafana-server

sudo systemctl enable grafana-server

Access Grafana at http://your-server-ip:3000. Default login: admin/admin.

Add Prometheus as a data source:

  1. Click Add data source
  2. Select Prometheus
  3. Set URL to http://localhost:9090
  4. Click Save & Test

Import a pre-built dashboard: Go to Dashboard ? Import and enter ID 1860 (Node Exporter Full) to visualize server metrics.

Best Practices

Use Labels Consistently

Labels are key-value pairs attached to metrics. Use them to identify environment (prod/staging), service name, region, or instance type. Avoid using high-cardinality labels (e.g., user IDs, session tokens) as they can explode the metric space and degrade performance.

Set Appropriate Scrape Intervals

While 15s is standard, adjust based on your needs. For critical services, 5s may be appropriate. For low-frequency metrics (e.g., batch jobs), 1m or longer is acceptable. Never set intervals below 1s unless absolutely necessary.

Separate Alerting and Recording Rules

Keep alerting rules in one file and recording rules (precomputed aggregations) in another. This improves readability and reduces evaluation overhead.

Enable Remote Write for Long-Term Storage

Prometheus stores data locally in its TSDB (Time Series Database). For long-term retention, use remote write to send data to Thanos, Cortex, or VictoriaMetrics. This also enables high availability and horizontal scaling.

Use Service Discovery for Dynamic Environments

Static configurations work for fixed servers. In Kubernetes or cloud environments, use service discovery mechanisms like Kubernetes SD, Consul, or AWS EC2 SD to automatically detect and scrape targets.

Monitor Prometheus Itself

Always monitor Prometheuss own metrics: scrape duration, target health, TSDB size, and query latency. Use the prometheus_target_scrape_duration_seconds and prometheus_local_storage_ingested_samples_total metrics to detect performance degradation.

Secure Your Deployment

By default, Prometheus exposes an admin API and UI on port 9090. In production:

  • Place Prometheus behind a reverse proxy (Nginx, Traefik) with TLS termination.
  • Enable basic authentication or integrate with OAuth2.
  • Restrict access via firewall rules (only allow internal networks or specific IPs).
  • Disable the admin API if not needed: --web.enable-admin-api=false.

Plan for Storage Capacity

Prometheus stores every metric sample. A single node exporter generates ~100200 metrics per second. At 15s intervals, thats 48 samples per minute per target. Multiply by hundreds of targets and youll need 100GB1TB+ of SSD storage per month. Use retention policies:

--storage.tsdb.retention.time=30d

Set this in your systemd service to limit data to 30 days unless youre using remote storage.

Use Alertmanager for Notification Routing

Prometheus alone can trigger alerts but lacks routing, grouping, and deduplication. Integrate with Alertmanager to send notifications via email, Slack, PagerDuty, or Microsoft Teams. Configure it in prometheus.yml under alerting.alertmanagers.

Tools and Resources

Essential Exporters

Exporters are small services that expose metrics in Prometheus format. Key ones include:

  • Node Exporter Server hardware and OS metrics.
  • Blackbox Exporter HTTP, TCP, ICMP probes.
  • Cadvisor Container resource usage (used with Docker/Kubernetes).
  • PostgreSQL Exporter Database metrics (queries, connections, replication).
  • MySQL Exporter MySQL performance metrics.
  • Redis Exporter Redis memory, connections, latency.
  • Pushgateway For batch jobs and ephemeral tasks that cant be scraped.

All exporters are available on GitHub under the Prometheus organization: github.com/prometheus.

Monitoring Stack Components

For a full observability stack, combine Prometheus with:

  • Grafana Dashboarding and visualization.
  • Alertmanager Alert routing and deduplication.
  • Thanos Long-term storage, global querying, and high availability.
  • VictoriaMetrics Scalable, drop-in Prometheus replacement with remote storage.
  • loki Log aggregation (complements metrics with logs).
  • jaeger Distributed tracing (for latency analysis across microservices).

Official Documentation and Learning Resources

Community and Support

The Prometheus community is active and helpful:

  • Slack: Join the CNCF Slack workspace and visit

    prometheus

  • Forum: discuss.prometheus.io
  • GitHub Issues: Report bugs or request features

Real Examples

Example 1: Monitoring a Web Application Stack

Consider a simple stack: Nginx ? Node.js API ? PostgreSQL ? Redis.

  • Use Node Exporter on the server to monitor CPU, memory, disk.
  • Use nginx-exporter to collect Nginx request rates, status codes, and connections.
  • Use nodejs-exporter (via the prom-client library) to expose custom app metrics like request latency and error rates.
  • Use postgres-exporter to monitor query execution time and connection pool usage.
  • Use redis-exporter to track memory fragmentation and eviction rates.

Alerting rules:

  • Trigger alert if PostgreSQL connection pool is >90% full.
  • Alert if Node.js request latency exceeds 2s for 5 minutes.
  • Trigger if Redis memory usage >95%.

Dashboard: Grafana with panels showing request throughput, error rate, database load, and system resource usage.

Example 2: Kubernetes Cluster Monitoring

In Kubernetes, deploy Prometheus using the Prometheus Helm Chart:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm repo update

helm install prometheus prometheus-community/kube-prometheus-stack

This installs:

  • Prometheus Server
  • Alertmanager
  • Node Exporter (DaemonSet)
  • Kube State Metrics
  • Grafana
  • Preconfigured dashboards

Metrics collected:

  • Pod CPU/Memory usage
  • Node resource pressure
  • Deployment replica status
  • Network policy violations
  • API server latency

Alerts include:

  • KubePodCrashLooping
  • KubeDeploymentReplicasMismatch
  • KubeNodeNotReady

Example 3: Monitoring a CI/CD Pipeline

Use the Pushgateway to collect metrics from Jenkins or GitHub Actions jobs:

In your CI script:

Capture build duration

BUILD_DURATION=$(date +%s)

... build logic ...

BUILD_DURATION=$(( $(date +%s) - BUILD_DURATION ))

Push to Pushgateway

curl -X POST -H "Content-Type: text/plain" --data "build_duration $BUILD_DURATION" http://pushgateway:9091/metrics/job/ci_build/branch/main

Prometheus scrapes the Pushgateway every 15s and includes the job and branch as labels.

Alert: Trigger if average build duration increases by 50% over 24 hours.

FAQs

What is the difference between Prometheus and Grafana?

Prometheus is a time-series database and monitoring system that collects and stores metrics. Grafana is a visualization tool that connects to Prometheus (and other data sources) to create dashboards and alerts. They are complementary: Prometheus gathers data; Grafana displays it.

Can Prometheus monitor Windows servers?

Yes, using the wmi_exporter. Install it on Windows machines to expose metrics like disk usage, network interfaces, and Windows service status. Configuration is similar to Node Exporter.

How much memory does Prometheus need?

Memory usage scales with the number of active time series. For 10,000 time series, expect 12GB RAM. For 100,000+, allocate 816GB. Use the prometheus_tsdb_head_series metric to monitor active series count.

Does Prometheus support log collection?

No. Prometheus is designed for metrics, not logs. For logs, use Loki (from Grafana Labs), Fluentd, or ELK stack. You can correlate logs and metrics using shared labels in Grafana.

How do I backup Prometheus data?

Prometheus stores data in /var/lib/prometheus. To backup, stop the service and copy the directory:

sudo systemctl stop prometheus

sudo tar -czf prometheus-backup.tar.gz /var/lib/prometheus

sudo systemctl start prometheus

For production, use remote write to a long-term storage system like Thanos or VictoriaMetrics.

Why are my targets showing as DOWN?

Common causes:

  • Network firewall blocking port 9090/9100
  • Target service not running
  • Incorrect IP or port in config
  • SSL/TLS certificate errors (for HTTPS targets)
  • Authentication required but not configured

Check the Targets page in Prometheus UI for detailed error messages.

Can I run Prometheus in Docker?

Yes. Use the official image:

docker run -d \

--name=prometheus \

-p 9090:9090 \

-v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \

prom/prometheus

For Docker Compose, define the service in docker-compose.yml with volumes and ports.

What is PromQL?

PromQL (Prometheus Query Language) is a functional query language used to select and aggregate time series data. Examples:

  • http_requests_total{job="api-server"} All HTTP requests for the API server job.
  • rate(http_requests_total[5m]) Requests per second over the last 5 minutes.
  • sum by(instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m])) CPU usage per instance.

Conclusion

Setting up Prometheus is a critical step toward achieving true observability in modern infrastructure. From monitoring bare-metal servers to Kubernetes clusters and cloud-native applications, Prometheus provides the flexibility, scalability, and depth needed to understand system behavior in real time.

This guide has walked you through the complete processfrom downloading binaries and configuring scrape targets, to securing the deployment and integrating with Grafana and Alertmanager. Youve seen real-world examples of monitoring web stacks, CI/CD pipelines, and containerized environments.

Remember: Prometheus is not a magic bullet. Its power lies in thoughtful configuration, consistent labeling, and integration with complementary tools. Avoid the trap of collecting everythingfocus on the metrics that matter most to your service level objectives (SLOs).

As your infrastructure grows, consider migrating to distributed solutions like Thanos or VictoriaMetrics for long-term storage and high availability. But for now, with this setup, you have a robust, production-ready monitoring foundation that will serve you well for years to come.

Start small. Monitor whats critical. Iterate based on real incidents. And let Prometheus be your eyes in the infrastructureso youre never blind to whats happening under the hood.