How to Send Alerts With Grafana
How to Send Alerts With Grafana Grafana is one of the most widely adopted open-source platforms for monitoring and observability. Originally designed for visualizing time-series data, Grafana has evolved into a powerful tool that enables teams to not only observe system behavior but also proactively respond to anomalies through intelligent alerting. Sending alerts with Grafana allows organizations
How to Send Alerts With Grafana
Grafana is one of the most widely adopted open-source platforms for monitoring and observability. Originally designed for visualizing time-series data, Grafana has evolved into a powerful tool that enables teams to not only observe system behavior but also proactively respond to anomalies through intelligent alerting. Sending alerts with Grafana allows organizations to detect issues before they impact users, reduce mean time to resolution (MTTR), and maintain high service availability across complex infrastructures.
Whether you're monitoring a single server, a Kubernetes cluster, or a global microservices architecture, Grafana’s alerting system provides the flexibility to define custom thresholds, trigger notifications across multiple channels, and correlate events across diverse data sources. Unlike basic monitoring tools that only display metrics, Grafana empowers you to transform raw data into actionable intelligence — turning passive dashboards into active guardians of system health.
This guide walks you through the complete process of setting up and optimizing alerting in Grafana. From configuring data sources and creating alert rules to integrating with notification channels and refining alert logic, you’ll learn how to build a robust, scalable alerting pipeline that minimizes noise and maximizes operational efficiency. By the end of this tutorial, you’ll be equipped to deploy enterprise-grade alerting that keeps your systems running smoothly — even during peak traffic or unexpected outages.
Step-by-Step Guide
Prerequisites
Before configuring alerts in Grafana, ensure the following prerequisites are met:
- Grafana installed and running (version 8.0 or higher recommended)
- A supported data source connected (e.g., Prometheus, InfluxDB, Loki, MySQL, PostgreSQL, etc.)
- Administrative access to Grafana to create and manage alert rules
- Network access to your notification endpoints (e.g., email server, Slack webhook, PagerDuty API)
For production environments, it’s strongly advised to run Grafana behind a reverse proxy with TLS encryption and role-based access control (RBAC) enabled.
Step 1: Connect a Data Source
Alerts in Grafana rely on time-series data. Without a connected data source, no metrics exist to evaluate against alert conditions. To add a data source:
- Log in to your Grafana instance.
- Click the gear icon in the left sidebar to open Configuration.
- Select Data Sources.
- Click Add data source.
- Choose your preferred source (e.g., Prometheus is the most common for alerting).
- Enter the URL of your data source (e.g., http://prometheus:9090 for a local Prometheus server).
- Click Save & Test to verify connectivity.
Once the data source is successfully connected, Grafana can query metrics and evaluate them in real time for alert conditions. Ensure the data source has sufficient retention and scrape intervals to support your alerting needs — for example, Prometheus should scrape metrics at least every 15–30 seconds for timely alerting.
Step 2: Create a Dashboard with a Time-Series Panel
Alerts are tied to panels within dashboards. You cannot create an alert without a visual panel that queries data from a connected source.
- Click the + icon in the left sidebar and select Dashboards → New Dashboard.
- Click Add new panel.
- In the query editor, select your data source (e.g., Prometheus).
- Enter a metric query. For example:
rate(http_requests_total[5m])to monitor request rates. - Adjust the visualization type to Time series.
- Click Apply to save the panel.
Ensure your query returns meaningful, stable data. Avoid overly complex or non-aggregated queries, as they may cause evaluation delays or false positives. Use functions like rate(), increase(), or avg_over_time() to smooth out raw counters and derive useful trends.
Step 3: Define an Alert Rule
Now that you have a panel with data, you can convert it into an alerting rule.
- While editing the panel, scroll down to the Alert section.
- Toggle Create alert to ON.
- Give your alert a clear, descriptive name — e.g., “High HTTP Error Rate”.
- Set the Condition type. For most use cases, choose “Query A” and define a threshold.
- For example, to trigger an alert when HTTP error rate exceeds 5% over 5 minutes:
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05 - Set the Evaluate every interval (e.g., 1m) — this determines how often Grafana re-evaluates the condition.
- Set the For duration (e.g., 5m) — this ensures the condition must persist for the specified time before triggering, reducing flapping.
- Click Save to persist the alert rule.
Important: The “For” duration is critical. Without it, transient spikes (e.g., a single failed request) can trigger false alerts. A 5-minute “For” window is typically sufficient for most production systems.
Step 4: Configure Notification Channels
Alerts are useless unless they reach the right people. Grafana supports multiple notification channels:
- Slack
- PagerDuty
- Microsoft Teams
- Webhook (custom integrations)
- SMS via third-party providers (e.g., Twilio)
- Opsgenie, VictorOps, etc.
To configure a channel:
- Go to Configuration → Alerting → Notification channels.
- Click Add channel.
- Select the channel type (e.g., Slack).
- For Slack, paste your incoming webhook URL from your Slack app.
- Specify the channel name (e.g.,
alerts).
- Optionally, customize the message template using Grafana’s templating variables (e.g.,
{{ .Title }},{{ .Message }}). - Click Test to send a sample alert.
- Click Save.
Repeat this process for each channel you want to use. For critical systems, configure at least two channels — e.g., Slack for immediate visibility and email as a backup.
Step 5: Assign Notification Channels to Alerts
After creating a notification channel, you must link it to your alert rule.
- Open the dashboard containing your alert panel.
- Click the alert name in the panel’s Alert section.
- Under Notification, select the channel(s) you created (e.g., “Slack Alerts”).
- Optionally, enable Continue notifications to receive updates if the alert remains firing.
- Click Save.
You can assign multiple channels to a single alert — for example, send to both Slack and PagerDuty for different response teams.
Step 6: Test the Alert
Before relying on your alert in production, simulate a condition that triggers it.
For example, if your alert triggers when HTTP error rate exceeds 5%, you can use a tool like hey or wrk to generate a burst of 5xx responses:
hey -n 1000 -c 10 -m POST http://your-service/error-endpoint
Alternatively, temporarily modify your metric query in Prometheus to return artificially high values:
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) * 10
Wait for the “For” duration to elapse. Then check your notification channel — you should see a formatted alert message.
Also verify the alert state in Grafana: go to Alerting → Alert rules and confirm the status changes from “OK” to “Firing”.
Step 7: Manage Alert States and Suppression
Grafana alerts have three states:
- OK — condition is within thresholds.
- Pending — condition met, but “For” duration has not elapsed.
- Firing — condition has been met for the full “For” duration.
To avoid alert fatigue, use alert suppression techniques:
- Use Grouping to combine similar alerts into one notification (e.g., all high CPU alerts on one server group).
- Set up Alert Silence during maintenance windows via Alerting → Silences.
- Use Labels and Annotations to tag alerts with severity, team, or environment so they can be filtered and routed correctly.
To create a silence:
- Go to Alerting → Silences.
- Click New Silence.
- Define matchers (e.g., alertname=“High CPU Usage”, environment=“production”).
- Set start and end times.
- Click Create.
During silence periods, Grafana will not send notifications — but the alert state will still be tracked internally.
Best Practices
1. Prioritize Alert Severity and Actionability
Not all alerts are created equal. Design your alerting system with a tiered severity model:
- Critical — Service is down or severely degraded (e.g., 99% error rate, no backend connectivity).
- High — Performance degradation impacting users (e.g., latency > 2s, error rate > 10%).
- Medium — Resource utilization approaching limits (e.g., disk usage > 85%, memory > 90%).
- Low — Non-urgent observations (e.g., minor metric drift, infrequent warnings).
Assign severity labels to alerts using annotations, and route them accordingly. For example, critical alerts should trigger SMS or PagerDuty, while low alerts can go to a general Slack channel.
2. Avoid Alert Fatigue with Thresholds and “For” Durations
One of the biggest causes of alert fatigue is overly sensitive thresholds or lack of “For” duration. A 10-second spike in latency should not wake up an on-call engineer.
Use the “For” clause consistently — 3 to 10 minutes is typical for production systems. Combine it with rate-based queries (e.g., rate()) to smooth out noise. Avoid alerting on raw counters unless you’re monitoring growth trends over time.
3. Use Labels and Annotations for Context
Labels and annotations make alerts more useful:
- Labels are key-value pairs used for grouping and routing (e.g.,
severity=critical,team=backend). - Annotations provide human-readable context (e.g.,
description=“Check database connection pool”,runbook=https://wiki.example.com/runbook/db-issues).
In your alert rule, define them like this:
annotations:
description: "HTTP error rate has exceeded 5% for 5 minutes."
runbook: "https://wiki.example.com/runbook/http-errors"
labels:
severity: "high"
team: "frontend"
These appear in notification messages and help responders take immediate action without digging through dashboards.
4. Test Alert Logic with Realistic Scenarios
Never assume your alert works. Simulate failures regularly:
- Restart a service and confirm the alert fires.
- Inject artificial latency or errors into a test endpoint.
- Verify alert recovery — does it send a “resolved” notification?
Use Grafana’s Alert History tab to review past alert states and confirm behavior matches expectations.
5. Centralize Alert Management with Alertmanager (Optional)
For large-scale deployments, consider integrating Grafana with Prometheus Alertmanager. While Grafana’s built-in alerting is sufficient for many use cases, Alertmanager provides advanced features:
- Alert deduplication and grouping
- Time-based routing and inhibition
- More granular notification policies
To use Alertmanager:
- Deploy Alertmanager alongside Prometheus.
- In Grafana, set your data source to use Alertmanager as the alert endpoint.
- Define alerting rules in Prometheus configuration files (YAML), not in Grafana panels.
This approach is recommended for teams managing hundreds of alerts across multiple Grafana instances.
6. Monitor Alerting System Health
Your alerting system must be reliable. If Grafana itself fails, you won’t receive alerts. To prevent this:
- Monitor Grafana’s own metrics (e.g.,
grafana_api_request_total,grafana_alerting_evaluations_total). - Set an alert for “Grafana is down” using an external uptime monitor (e.g., UptimeRobot, Pingdom).
- Ensure Grafana is deployed with high availability — run multiple replicas behind a load balancer.
- Back up alert rules and dashboards using Grafana’s provisioning system or Git.
7. Document and Review Alerts Regularly
Alerts decay over time. A rule that made sense last year may no longer be relevant. Establish a quarterly alert review process:
- Identify alerts that never fired.
- Remove or archive stale rules.
- Update runbooks and ownership labels.
- Validate thresholds against current performance baselines.
Treat alerting as code — store alert rules in version control (e.g., Git) and manage them through CI/CD pipelines.
Tools and Resources
Core Tools
- Grafana — The central platform for visualization and alerting. Download at grafana.com/download.
- Prometheus — The most popular metrics collection and alerting engine. Ideal for integration with Grafana. prometheus.io/download
- Alertmanager — Advanced alert routing and suppression for Prometheus. GitHub
- Loki — Log aggregation system that integrates with Grafana for log-based alerting. grafana.com/oss/loki
- Node Exporter — Exports machine-level metrics (CPU, memory, disk, network). Essential for infrastructure monitoring. GitHub
Notification Integrations
- Slack — Use incoming webhooks for real-time team alerts.
- PagerDuty — Enterprise-grade incident management with escalation policies.
- Microsoft Teams — Use webhook connectors for Teams channel alerts.
- Email — SMTP integration via Gmail, Outlook, or self-hosted mail servers.
- Twilio — Send SMS alerts using API keys and phone number templates.
- Opsgenie — Robust alert routing with on-call scheduling.
Template and Example Libraries
- Grafana Dashboards — Import pre-built dashboards from grafana.com/dashboards (search for “alerting” or “monitoring”).
- Prometheus Alert Rules — Use community alerting rules from Prometheus GitHub.
- Grafana Provisioning — Automate alert creation using YAML config files. Docs: grafana.com/docs/provisioning
Learning Resources
- Grafana Documentation: Alerting — grafana.com/docs/alerting
- YouTube: Grafana Alerting Tutorials — Search for “Grafana alerting setup” for video walkthroughs.
- Books — “Monitoring with Prometheus” by Brian Brazil (O’Reilly) covers alerting in depth.
- Community — Join the Grafana Slack community or Reddit’s r/Grafana for peer support.
Real Examples
Example 1: High HTTP Error Rate Alert
Scenario: A web application is experiencing a surge in 5xx errors, indicating backend failures.
Query:
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
Condition: Evaluate every 1 minute, for 5 minutes.
Annotations:
- description: “HTTP error rate exceeded 5% for 5 consecutive minutes.”
- runbook: “https://wiki.example.com/runbook/http-5xx”
Labels:
- severity: “critical”
- team: “backend”
Notification Channel: Slack (
critical-alerts) and PagerDuty.
Outcome: When the alert fires, the backend team receives a detailed message with a direct link to troubleshooting steps. The alert resolves automatically once the error rate drops below 5% for 5 minutes.
Example 2: Disk Usage Alert on Kubernetes Nodes
Scenario: Kubernetes nodes are running out of disk space, causing pod evictions.
Query:
100 - (node_filesystem_avail_bytes{mountpoint="/"} * 100 / node_filesystem_size_bytes{mountpoint="/"})
Condition: Value > 85, evaluated every 2 minutes, for 10 minutes.
Annotations:
- description: “Disk usage on {{ $labels.instance }} has exceeded 85%.”
- runbook: “https://wiki.example.com/runbook/disk-full-k8s”
Labels:
- severity: “high”
- team: “infrastructure”
- node: “{{ $labels.instance }}”
Notification Channel: Slack (
infra-alerts) and email to DevOps team.
Outcome: The alert triggers only after sustained high usage, avoiding false positives from temporary file writes. The message includes the exact node name, allowing rapid remediation.
Example 3: Application Latency Spike
Scenario: End-user experience is degrading due to increased API response times.
Query:
avg_over_time(http_request_duration_seconds{job="api-service"}[5m]) > 1.5
Condition: Evaluate every 1 minute, for 3 minutes.
Annotations:
- description: “Average API latency exceeded 1.5s for 3 minutes.”
- runbook: “https://wiki.example.com/runbook/api-latency”
Labels:
- severity: “high”
- team: “api”
Notification Channel: Slack (
api-alerts) and Microsoft Teams.
Outcome: The frontend team is alerted before users report slowdowns. The alert includes a link to a dashboard showing latency trends across regions, enabling faster diagnosis.
Example 4: Log-Based Alert Using Loki
Scenario: A microservice is logging repeated “connection refused” errors.
Query:
count_over_time({job="auth-service"} |= "connection refused" [5m]) > 10
Condition: Evaluate every 1 minute, for 2 minutes.
Annotations:
- description: “Auth service has logged 10+ ‘connection refused’ errors in the last 5 minutes.”
- runbook: “https://wiki.example.com/runbook/auth-connection-refused”
Labels:
- severity: “critical”
- team: “auth”
Notification Channel: Slack (
critical-alerts) and PagerDuty.
Outcome: This log-based alert detects failures that may not be exposed in metrics — such as downstream service unavailability — and triggers immediate investigation.
FAQs
Can I send alerts without using Prometheus?
Yes. Grafana supports alerting on data from many sources, including InfluxDB, MySQL, PostgreSQL, CloudWatch, and more. As long as the data source supports querying time-series data and returns numeric values, you can create alerts. However, Prometheus remains the most reliable and feature-complete option for alerting due to its native integration and powerful query language (PromQL).
Why isn’t my alert firing even though the metric exceeds the threshold?
Common causes include:
- The “For” duration hasn’t elapsed — wait for the full window.
- The data source is not returning data — check the query in Explore mode.
- The alert rule is disabled — verify it’s toggled on in the Alerting → Alert rules section.
- Notification channel is misconfigured — test the channel independently.
Can Grafana send alerts via SMS?
Yes, but not natively. Use a webhook integration with a service like Twilio, Vonage, or Plivo. Configure a custom webhook in Grafana that sends a POST request to the SMS provider’s API with the alert details.
How do I silence alerts during deployments?
Use Grafana’s Silences feature. Define a silence that matches your alert’s labels (e.g., alertname=“High CPU”, environment=“staging”) and set a start/end time matching your deployment window. Silences override notifications without deleting alerts.
Do alerts work when Grafana is offline?
No. If Grafana is down, it cannot evaluate queries or send notifications. For high availability, deploy Grafana in a clustered setup with a load balancer and persistent storage. For mission-critical systems, consider using Prometheus Alertmanager with redundant Grafana instances.
Can I schedule alerts to only trigger during business hours?
Grafana does not natively support time-based alert scheduling. However, you can simulate this by modifying your query to include a time filter. For example:
rate(http_requests_total[5m]) > 100 and hour() >= 9 and hour() <= 17
Alternatively, use external tools like cron jobs or alert managers that support time-based routing.
How do I prevent duplicate alerts for the same issue?
Use alert grouping and deduplication. If using Prometheus Alertmanager, configure grouping by labels like “alertname” and “instance”. In Grafana’s built-in alerting, ensure you use consistent labels and avoid creating multiple identical alerts for the same metric.
Is it possible to auto-resolve alerts?
Yes. Grafana automatically sends a “resolved” notification when the condition returns to OK. Ensure your notification channel supports resolved alerts (Slack, email, and PagerDuty do). You can also customize the resolved message using templates like {{ .Status }} and {{ .EndsAt }}.
Conclusion
Sending alerts with Grafana is not just a technical task — it’s a strategic practice that transforms monitoring from reactive observation to proactive resilience. By following the steps outlined in this guide, you’ve learned how to connect data sources, define intelligent alert rules, configure reliable notification channels, and apply industry best practices to reduce noise and improve response efficiency.
Effective alerting doesn’t mean sending more alerts — it means sending the right alerts to the right people at the right time. With properly tuned thresholds, meaningful annotations, and well-documented runbooks, your team can respond to incidents with confidence and speed.
As your infrastructure grows, continue refining your alerting strategy. Regularly review, test, and evolve your rules. Integrate alerting into your CI/CD pipeline. Treat alerts as code. And always prioritize clarity over volume.
With Grafana as your central nervous system, you’re no longer just watching metrics — you’re safeguarding your services, your users, and your business. Start small, iterate often, and build an alerting system that works as hard as you do.