How to Setup Alertmanager
How to Setup Alertmanager Alertmanager is a critical component in modern monitoring and observability architectures, especially when paired with Prometheus. It is responsible for receiving alerts generated by Prometheus and managing their delivery through various notification channels such as email, Slack, PagerDuty, and more. Unlike Prometheus, which focuses on metric collection and alert rule ev
How to Setup Alertmanager
Alertmanager is a critical component in modern monitoring and observability architectures, especially when paired with Prometheus. It is responsible for receiving alerts generated by Prometheus and managing their delivery through various notification channels such as email, Slack, PagerDuty, and more. Unlike Prometheus, which focuses on metric collection and alert rule evaluation, Alertmanager handles the complex logic of deduplication, grouping, silencing, and routing alerts to the right recipients at the right time. Setting up Alertmanager correctly ensures that your team is alerted only to meaningful incidentsreducing alert fatigue and improving incident response times. In this comprehensive guide, we will walk you through every step required to install, configure, and optimize Alertmanager for production-grade environments. Whether you're managing a small cluster or a large-scale microservices architecture, mastering Alertmanager setup is essential for maintaining system reliability and operational excellence.
Step-by-Step Guide
Prerequisites
Before beginning the setup process, ensure you have the following prerequisites in place:
- A working Prometheus installation (v2.0 or later)
- Access to a Linux-based server or container orchestration platform (e.g., Docker, Kubernetes)
- Basic familiarity with YAML configuration files
- Network access to external notification services (Slack, SMTP servers, etc.) if using them
- Administrative privileges to install and configure services
Its strongly recommended to run Alertmanager in a dedicated environment separate from Prometheus to ensure high availability and to avoid resource contention. If you're using containerized infrastructure, deploying Alertmanager as a sidecar or standalone container is ideal.
Step 1: Download Alertmanager
Alertmanager is distributed as a standalone binary by the Prometheus project. Visit the official GitHub releases page at https://github.com/prometheus/alertmanager/releases to find the latest stable version.
For Linux systems, use the following commands to download and extract the binary:
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz
cd alertmanager-0.26.0.linux-amd64
Verify the installation by checking the version:
./alertmanager --version
You should see output similar to:
alertmanager, version 0.26.0 (branch: HEAD, revision: xxxxxxx)
build user: xxx
build date: xxx
go version: go1.21.5
Step 2: Create Configuration File
Alertmanager is configured using a YAML file named alertmanager.yml. This file defines how alerts are routed, grouped, silenced, and delivered. Create the configuration file in the same directory as the binary:
nano alertmanager.yml
Start with a minimal configuration that routes all alerts to a single receiver:
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: 'alerts@yourcompany.com'
from: 'alertmanager@yourcompany.com'
smarthost: 'smtp.yourcompany.com:587'
auth_username: 'alertmanager@yourcompany.com'
auth_password: 'your-smtp-password'
html: '{{ template "email.default.html" . }}'
headers:
Subject: '[Alert] {{ .CommonLabels.alertname }}'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'instance']
This configuration includes:
- global.resolve_timeout: The time after which an alert is considered resolved if no update is received.
- route: Defines how alerts are grouped and routed. Alerts with the same
alertnameare grouped together and sent every 10 seconds initially, then repeated every hour if unresolved. - receivers: Specifies the notification methodin this case, email via SMTP.
- inhibit_rules: Prevents duplicate alerts; if a critical alert is firing, warning alerts for the same instance and alert name are suppressed.
Always validate your configuration before starting Alertmanager:
./alertmanager --config.file=alertmanager.yml --web.listen-address=":9093" --test.config
If the configuration is valid, youll see: Success: configured correctly.
Step 3: Configure Prometheus to Send Alerts to Alertmanager
Alertmanager does not generate alertsit receives them from Prometheus. You must configure Prometheus to forward alerts to the Alertmanager instance.
Edit your Prometheus configuration file (prometheus.yml) and add the alerting section:
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
rule_files:
- "alert.rules"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
Ensure the target IP and port match your Alertmanager instance. If Alertmanager is running on a different host, replace localhost:9093 with the appropriate address.
Step 4: Define Alert Rules in Prometheus
Alert rules are defined in separate files (e.g., alert.rules) and loaded by Prometheus. Create this file in the same directory as prometheus.yml:
nano alert.rules
Add the following sample alert rules:
groups:
- name: example
rules:
- alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m{job="myapp"} > 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "High request latency detected"
description: "{{ $value }}s average request latency for job {{ $labels.job }} over the last 5 minutes."
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
These rules trigger alerts based on:
- HighRequestLatency: When average request latency exceeds 0.5 seconds for 10 minutes.
- InstanceDown: When a monitored target is unreachable (up == 0) for 5 minutes.
Restart Prometheus to load the new rules:
systemctl restart prometheus
Verify the rules are loaded by visiting http://<prometheus-host>:9090/alerts. You should see your rules listed with their current status (firing, pending, or inactive).
Step 5: Start Alertmanager
Once the configuration is validated and Prometheus is configured, start Alertmanager:
nohup ./alertmanager --config.file=alertmanager.yml --web.listen-address=":9093" > alertmanager.log 2>&1 &
To ensure Alertmanager starts automatically on boot, create a systemd service file:
sudo nano /etc/systemd/system/alertmanager.service
Add the following content:
[Unit]
Description=Alertmanager
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/opt/alertmanager/alertmanager --config.file=/opt/alertmanager/alertmanager.yml --web.listen-address=:9093
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Reload systemd and enable the service:
sudo systemctl daemon-reload
sudo systemctl enable alertmanager
sudo systemctl start alertmanager
Verify its running:
sudo systemctl status alertmanager
Access the Alertmanager web interface at http://<your-server-ip>:9093. You should see a dashboard showing active alerts, silences, and inhibition rules.
Step 6: Configure Notification Integrations
Alertmanager supports multiple notification integrations. Below are examples for common platforms.
Slack Integration
To send alerts to Slack, first create a Slack webhook URL:
- Go to https://api.slack.com/apps
- Click Create New App ? From scratch
- Name your app and select your workspace
- Go to Incoming Webhooks ? Activate ? Add New Webhook
- Choose a channel and copy the generated webhook URL
Update your alertmanager.yml to include a Slack receiver:
receivers:
- name: 'slack-alerts'
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
channel: '
alerts'
username: 'Alertmanager'
text: |
{{ range .Alerts }}
*Alert:* {{ .Labels.alertname }}
*Description:* {{ .Annotations.description }}
*Severity:* {{ .Labels.severity }}
*Instance:* {{ .Labels.instance }}
*Time:* {{ .StartsAt.Format "2006-01-02 15:04:05" }}
{{ end }}
Update the route to send critical alerts to Slack:
route:
group_by: ['alertname', 'severity']
group_wait: 10s
group_interval: 5m
repeat_interval: 1h
receiver: 'email-notifications'
routes:
- match:
severity: critical
receiver: 'slack-alerts'
PagerDuty Integration
To integrate with PagerDuty:
- Log in to your PagerDuty account
- Go to Services ? Add Service
- Select Prometheus as the integration type
- Copy the integration key
Add to alertmanager.yml:
receivers:
- name: 'pagerduty-alerts'
pagerduty_configs:
- routing_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'
description: '{{ .CommonAnnotations.description }}'
details:
alertname: '{{ .CommonLabels.alertname }}'
instance: '{{ .CommonLabels.instance }}'
severity: '{{ .CommonLabels.severity }}'
Route critical alerts to PagerDuty:
routes:
- match:
severity: critical
receiver: 'pagerduty-alerts'
Webhook Integration
For custom integrations (e.g., internal ticketing systems), use the webhook receiver:
receivers:
- name: 'webhook-notifications'
webhook_configs:
- url: 'http://internal-ticketing-system:8080/alert'
send_resolved: true
http_config:
basic_auth:
username: 'alertmanager'
password: 'your-secret'
The send_resolved: true parameter ensures that when an alert is resolved, a follow-up notification is sent to indicate resolution.
Step 7: Test Alert Delivery
Once configured, test the entire pipeline:
- Manually trigger an alert by stopping a monitored service (e.g.,
curl -X POST http://localhost:9090/-/reloadif using a test target) - Check the Prometheus alerts page (
http://localhost:9090/alerts) to confirm the alert is firing - Check the Alertmanager UI (
http://localhost:9093) to confirm the alert is received and routed - Verify you receive the notification via email, Slack, or PagerDuty
- Restart the service and confirm a resolved notification is sent
If notifications fail, check:
- Alertmanager logs:
journalctl -u alertmanager -f - Prometheus logs:
journalctl -u prometheus -f - Network connectivity to notification endpoints
- Authentication credentials (SMTP passwords, webhook keys)
Best Practices
Use Meaningful Labels and Annotations
Labels are used for grouping and routing alerts. Use consistent, descriptive labels such as severity, team, service, and instance. Annotations provide human-readable contextinclude summary, description, and links to dashboards or runbooks.
labels:
severity: critical
team: backend
service: payment-service
annotations:
summary: "Payment service is unresponsive"
description: "HTTP 500 errors exceeded threshold for 10 minutes"
runbook: "https://runbooks.yourcompany.com/payment-service-failure"
Implement Alert Grouping and Inhibition
Grouping prevents alert storms by bundling similar alerts. For example, if 50 instances go down simultaneously, group them under one alert instead of triggering 50 individual notifications.
Inhibition prevents noisy alerts. For instance, if a server is down (severity: critical), theres no need to alert about high CPU usage or disk space on that same server. Use inhibition rules to suppress lower-severity alerts when higher ones are active.
Set Appropriate Timeouts and Repeat Intervals
Too frequent repeat intervals cause alert fatigue. Set repeat_interval to at least 14 hours for non-critical alerts. For critical alerts, 1530 minutes may be acceptable.
Ensure group_wait and group_interval are tuned to your environment. For fast-changing systems, use shorter intervals (e.g., 10s1m). For stable environments, 15 minutes is sufficient.
Use Multiple Receivers for Redundancy
Never rely on a single notification channel. Configure at least two delivery methodsfor example, email + Slack for internal teams, and PagerDuty for on-call engineers.
Route alerts based on severity:
- Warning: Email + Slack
- Critical: PagerDuty + Slack + SMS (if supported)
Secure Your Configuration
Never commit sensitive data (SMTP passwords, webhook keys) to version control. Use secrets management tools like HashiCorp Vault, Kubernetes Secrets, or environment variables:
receivers:
- name: 'email-notifications'
email_configs:
- to: 'alerts@company.com'
smarthost: '{{ .Env.SMTP_HOST }}'
auth_username: '{{ .Env.SMTP_USER }}'
auth_password: '{{ .Env.SMTP_PASS }}'
Start Alertmanager with:
SMTP_HOST=smtp.company.com SMTP_USER=alertmanager SMTP_PASS=secret ./alertmanager --config.file=alertmanager.yml
Monitor Alertmanager Itself
Alertmanager exposes metrics at /metrics. Create a Prometheus scrape job for it:
- job_name: 'alertmanager'
static_configs:
- targets: ['alertmanager-host:9093']
Alert on Alertmanager failures:
ALERT AlertmanagerDown
IF up{job="alertmanager"} == 0
FOR 5m
LABELS { severity = "critical" }
ANNOTATIONS {
summary = "Alertmanager is unreachable",
description = "Alertmanager has been down for more than 5 minutes. Notifications may be lost."
}
Use Templates for Rich Notifications
Alertmanager supports Go templates for customizing alert content. Create a template file (templates/email.tmpl):
{{ define "email.default.html" }}
body { font-family: Arial, sans-serif; }
.alert { border-left: 4px solid
dc3545; padding: 10px; margin: 10px 0; }
.resolved { border-left-color: 28a745; }
{{ range .Alerts }}
{{ .Labels.alertname }} - {{ .Status }}
Severity: {{ .Labels.severity }}
Instance: {{ .Labels.instance }}
Description: {{ .Annotations.description }}
Started: {{ .StartsAt }}
{{ if .EndsAt }}
Resolved: {{ .EndsAt }}
{{ end }}
{{ end }}
{{ end }}
Reference it in your config:
templates:
- '/opt/alertmanager/templates/*.tmpl'
Tools and Resources
Official Documentation
The official Alertmanager documentation is the most authoritative source:
- https://prometheus.io/docs/alerting/latest/alertmanager/
- Configuration Reference
- Notification Templates
Configuration Validators
Always validate your configuration before deployment:
- Alertmanager CLI:
./alertmanager --config.file=alertmanager.yml --test.config - YAML Linters: Use YAML Lint or VS Code with YAML extensions to catch syntax errors.
Monitoring and Visualization Tools
- Prometheus: Core alerting engine
- Grafana: Visualize alert status and metrics
- Alertmanager UI: Built-in dashboard for viewing active alerts and silences
- Thanos: For long-term storage and global alerting across multiple Prometheus instances
Community Templates and Examples
GitHub hosts numerous open-source Alertmanager configurations:
Containerized Deployments
For Docker:
docker run -d --name alertmanager \
-p 9093:9093 \
-v $(pwd)/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
prom/alertmanager:v0.26.0
For Kubernetes, use the Prometheus Operator:
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: main
spec:
replicas: 2
securityContext:
runAsNonRoot: true
runAsUser: 65534
image: quay.io/prometheus/alertmanager:v0.26.0
configSecret: alertmanager-main
Real Examples
Example 1: E-commerce Platform Alerting
An e-commerce company runs a microservices architecture with payment, inventory, and user services. They configure Alertmanager as follows:
- Payment Service: Critical alerts on transaction failures ? PagerDuty (on-call engineer)
- Inventory Service: Warning alerts on stock levels below threshold ? Slack channel
inventory-alerts
- User Service: Warning alerts on login failures ? Email to DevOps team
They use inhibition rules to suppress inventory alerts if the payment service is down (indicating a broader outage).
Example 2: Cloud-Native Infrastructure
A SaaS company runs 200+ containers across 5 clusters. They deploy Alertmanager in high availability mode with 3 replicas behind a load balancer.
Alerts are routed based on namespace:
routes:
- match:
namespace: production
severity: critical
receiver: 'pagerduty-prod'
- match:
namespace: staging
severity: critical
receiver: 'slack-staging'
They use a custom webhook to auto-create Jira tickets for all critical alerts.
Example 3: Hybrid On-Prem and Cloud Setup
A financial institution has on-prem servers and AWS EC2 instances. They run separate Prometheus instances for each environment but use a single Alertmanager cluster.
Alerts are tagged with environment: onprem or environment: aws. Routing rules direct alerts to different teams:
routes:
- match:
environment: onprem
severity: critical
receiver: 'onprem-team'
- match:
environment: aws
severity: critical
receiver: 'cloud-team'
This ensures the correct team responds without confusion.
Example 4: Alert Suppression During Maintenance
Before scheduled maintenance, engineers create a silence in Alertmanager:
- Go to
http://alertmanager:9093//silences
- Click New Silence
- Set matchers:
job="database-backup" - Set duration: 2 hours
All alerts from the database-backup job are suppressed during the window, preventing false positives.
FAQs
What is the difference between Prometheus and Alertmanager?
Prometheus collects and stores metrics and evaluates alerting rules to generate alerts. Alertmanager receives those alerts and manages their deliveryhandling deduplication, grouping, silencing, and routing. Prometheus generates; Alertmanager delivers.
Can Alertmanager work without Prometheus?
No. Alertmanager is designed as a companion to Prometheus and relies on it to generate alerts. It does not collect metrics or evaluate rules on its own.
How do I test if Alertmanager is receiving alerts?
Visit the Alertmanager web UI at http://<host>:9093. The Alerts tab shows all active alerts. You can also check the logs using journalctl -u alertmanager or inspect the Prometheus alerts page.
Why am I not receiving email notifications?
Common causes include: incorrect SMTP credentials, firewall blocking port 587/465, misconfigured from or to addresses, or the SMTP server requiring TLS/SSL. Test SMTP connectivity using telnet smtp.yourserver.com 587.
Can I send alerts to multiple teams based on the service?
Yes. Use label matching in the route configuration. For example:
routes:
- match:
service: frontend
receiver: 'frontend-team'
- match:
service: backend
receiver: 'backend-team'
How do I silence an alert temporarily?
Go to the Alertmanager UI ? Silences ? New Silence. Define matchers (e.g., alertname=HighCPU) and set a duration. Silences are stored in memory and persist across restarts if you use persistent storage.
Does Alertmanager support SMS notifications?
Alertmanager does not natively support SMS, but you can integrate via third-party services like Twilio using webhook receivers or through PagerDuty, which supports SMS as a notification method.
How do I upgrade Alertmanager?
Download the new binary, validate the config with the new version, then restart the service. Always test in a staging environment first. Configuration files are backward compatible across minor versions.
What happens if Alertmanager crashes?
Alerts remain queued in Prometheus until Alertmanager is back online. Prometheus retries delivery with exponential backoff. If Alertmanager is down for too long, alerts may be lost unless you use persistent storage or HA setups.
Can I use Alertmanager with other monitoring tools?
While Alertmanager is designed for Prometheus, you can send alerts from other systems (e.g., Zabbix, Nagios) via webhook integrations if they support HTTP POST payloads. However, native integration is only guaranteed with Prometheus.
Conclusion
Setting up Alertmanager is a foundational step in building a robust, reliable monitoring system. By properly configuring alert routing, grouping, and notification channels, you transform raw metrics into actionable insights that keep your systems running smoothly. This guide has walked you through every critical phasefrom downloading the binary and validating configurations to integrating with Slack, PagerDuty, and email, and implementing enterprise-grade best practices.
Remember: Alertmanager is not a set it and forget it tool. Regularly review your alert rules, tune grouping and inhibition policies, and ensure your team is trained to respond to alerts effectively. Use templates to enrich notifications, secure your secrets, and monitor Alertmanager itself to prevent blind spots.
As your infrastructure scales, consider deploying Alertmanager in high availability mode, integrating it with Kubernetes operators, and leveraging centralized logging to audit alert activity. With thoughtful configuration and continuous refinement, Alertmanager becomes the nervous system of your observability stackensuring that when something breaks, the right people are notified, at the right time, with the right context.