How to Send Alerts With Grafana

How to Send Alerts With Grafana Grafana is one of the most widely adopted open-source platforms for monitoring and observability. Originally designed for visualizing time-series data, Grafana has evolved into a powerful tool that enables teams to not only observe system behavior but also proactively respond to anomalies through intelligent alerting. Sending alerts with Grafana allows organizations

alex

Oct 30, 2025 - 20:28

How to Send Alerts With Grafana

Grafana is one of the most widely adopted open-source platforms for monitoring and observability. Originally designed for visualizing time-series data, Grafana has evolved into a powerful tool that enables teams to not only observe system behavior but also proactively respond to anomalies through intelligent alerting. Sending alerts with Grafana allows organizations to detect issues before they impact users, reduce mean time to resolution (MTTR), and maintain high service availability across complex infrastructures.

Whether you're monitoring a single server, a Kubernetes cluster, or a global microservices architecture, Grafanas alerting system provides the flexibility to define custom thresholds, trigger notifications across multiple channels, and correlate events across diverse data sources. Unlike basic monitoring tools that only display metrics, Grafana empowers you to transform raw data into actionable intelligence turning passive dashboards into active guardians of system health.

This guide walks you through the complete process of setting up and optimizing alerting in Grafana. From configuring data sources and creating alert rules to integrating with notification channels and refining alert logic, youll learn how to build a robust, scalable alerting pipeline that minimizes noise and maximizes operational efficiency. By the end of this tutorial, youll be equipped to deploy enterprise-grade alerting that keeps your systems running smoothly even during peak traffic or unexpected outages.

Step-by-Step Guide

Prerequisites

Before configuring alerts in Grafana, ensure the following prerequisites are met:

Grafana installed and running (version 8.0 or higher recommended)
A supported data source connected (e.g., Prometheus, InfluxDB, Loki, MySQL, PostgreSQL, etc.)
Administrative access to Grafana to create and manage alert rules
Network access to your notification endpoints (e.g., email server, Slack webhook, PagerDuty API)

For production environments, its strongly advised to run Grafana behind a reverse proxy with TLS encryption and role-based access control (RBAC) enabled.

Step 1: Connect a Data Source

Alerts in Grafana rely on time-series data. Without a connected data source, no metrics exist to evaluate against alert conditions. To add a data source:

Log in to your Grafana instance.
Click the gear icon in the left sidebar to open Configuration.
Select Data Sources.
Click Add data source.
Choose your preferred source (e.g., Prometheus is the most common for alerting).
Enter the URL of your data source (e.g., http://prometheus:9090 for a local Prometheus server).
Click Save & Test to verify connectivity.

Once the data source is successfully connected, Grafana can query metrics and evaluate them in real time for alert conditions. Ensure the data source has sufficient retention and scrape intervals to support your alerting needs for example, Prometheus should scrape metrics at least every 1530 seconds for timely alerting.

Step 2: Create a Dashboard with a Time-Series Panel

Alerts are tied to panels within dashboards. You cannot create an alert without a visual panel that queries data from a connected source.

Click the + icon in the left sidebar and select Dashboards ? New Dashboard.
Click Add new panel.
In the query editor, select your data source (e.g., Prometheus).
Enter a metric query. For example: rate(http_requests_total[5m]) to monitor request rates.
Adjust the visualization type to Time series.
Click Apply to save the panel.

Ensure your query returns meaningful, stable data. Avoid overly complex or non-aggregated queries, as they may cause evaluation delays or false positives. Use functions like rate(), increase(), or avg_over_time() to smooth out raw counters and derive useful trends.

Step 3: Define an Alert Rule

Now that you have a panel with data, you can convert it into an alerting rule.

While editing the panel, scroll down to the Alert section.
Toggle Create alert to ON.
Give your alert a clear, descriptive name e.g., High HTTP Error Rate.
Set the Condition type. For most use cases, choose Query A and define a threshold.
For example, to trigger an alert when HTTP error rate exceeds 5% over 5 minutes: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
Set the Evaluate every interval (e.g., 1m) this determines how often Grafana re-evaluates the condition.
Set the For duration (e.g., 5m) this ensures the condition must persist for the specified time before triggering, reducing flapping.
Click Save to persist the alert rule.

Important: The For duration is critical. Without it, transient spikes (e.g., a single failed request) can trigger false alerts. A 5-minute For window is typically sufficient for most production systems.

Step 4: Configure Notification Channels

Alerts are useless unless they reach the right people. Grafana supports multiple notification channels:

Email
Slack
PagerDuty
Microsoft Teams
Webhook (custom integrations)
SMS via third-party providers (e.g., Twilio)
Opsgenie, VictorOps, etc.

To configure a channel:

Go to Configuration ? Alerting ? Notification channels.
Click Add channel.
Select the channel type (e.g., Slack).
For Slack, paste your incoming webhook URL from your Slack app.
Specify the channel name (e.g.,
alerts).
Optionally, customize the message template using Grafanas templating variables (e.g., {{ .Title }}, {{ .Message }}).
Click Test to send a sample alert.
Click Save.

Repeat this process for each channel you want to use. For critical systems, configure at least two channels e.g., Slack for immediate visibility and email as a backup.

Step 5: Assign Notification Channels to Alerts

After creating a notification channel, you must link it to your alert rule.

Open the dashboard containing your alert panel.
Click the alert name in the panels Alert section.
Under Notification, select the channel(s) you created (e.g., Slack Alerts).
Optionally, enable Continue notifications to receive updates if the alert remains firing.
Click Save.

You can assign multiple channels to a single alert for example, send to both Slack and PagerDuty for different response teams.

Step 6: Test the Alert

Before relying on your alert in production, simulate a condition that triggers it.

For example, if your alert triggers when HTTP error rate exceeds 5%, you can use a tool like hey or wrk to generate a burst of 5xx responses:

hey -n 1000 -c 10 -m POST http://your-service/error-endpoint

Alternatively, temporarily modify your metric query in Prometheus to return artificially high values:

rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) * 10

Wait for the For duration to elapse. Then check your notification channel you should see a formatted alert message.

Also verify the alert state in Grafana: go to Alerting ? Alert rules and confirm the status changes from OK to Firing.

Step 7: Manage Alert States and Suppression

Grafana alerts have three states:

OK condition is within thresholds.
Pending condition met, but For duration has not elapsed.
Firing condition has been met for the full For duration.

To avoid alert fatigue, use alert suppression techniques:

Use Grouping to combine similar alerts into one notification (e.g., all high CPU alerts on one server group).
Set up Alert Silence during maintenance windows via Alerting ? Silences.
Use Labels and Annotations to tag alerts with severity, team, or environment so they can be filtered and routed correctly.

To create a silence:

Go to Alerting ? Silences.
Click New Silence.
Define matchers (e.g., alertname=High CPU Usage, environment=production).
Set start and end times.
Click Create.

During silence periods, Grafana will not send notifications but the alert state will still be tracked internally.

Best Practices

1. Prioritize Alert Severity and Actionability

Not all alerts are created equal. Design your alerting system with a tiered severity model:

Critical Service is down or severely degraded (e.g., 99% error rate, no backend connectivity).
High Performance degradation impacting users (e.g., latency > 2s, error rate > 10%).
Medium Resource utilization approaching limits (e.g., disk usage > 85%, memory > 90%).
Low Non-urgent observations (e.g., minor metric drift, infrequent warnings).

Assign severity labels to alerts using annotations, and route them accordingly. For example, critical alerts should trigger SMS or PagerDuty, while low alerts can go to a general Slack channel.

2. Avoid Alert Fatigue with Thresholds and For Durations

One of the biggest causes of alert fatigue is overly sensitive thresholds or lack of For duration. A 10-second spike in latency should not wake up an on-call engineer.

Use the For clause consistently 3 to 10 minutes is typical for production systems. Combine it with rate-based queries (e.g., rate()) to smooth out noise. Avoid alerting on raw counters unless youre monitoring growth trends over time.

3. Use Labels and Annotations for Context

Labels and annotations make alerts more useful:

Labels are key-value pairs used for grouping and routing (e.g., severity=critical, team=backend).
Annotations provide human-readable context (e.g., description=Check database connection pool, runbook=https://wiki.example.com/runbook/db-issues).

In your alert rule, define them like this:

annotations: description: "HTTP error rate has exceeded 5% for 5 minutes." runbook: "https://wiki.example.com/runbook/http-errors" labels: severity: "high" team: "frontend"

These appear in notification messages and help responders take immediate action without digging through dashboards.

4. Test Alert Logic with Realistic Scenarios

Never assume your alert works. Simulate failures regularly:

Restart a service and confirm the alert fires.
Inject artificial latency or errors into a test endpoint.
Verify alert recovery does it send a resolved notification?

Use Grafanas Alert History tab to review past alert states and confirm behavior matches expectations.

5. Centralize Alert Management with Alertmanager (Optional)

For large-scale deployments, consider integrating Grafana with Prometheus Alertmanager. While Grafanas built-in alerting is sufficient for many use cases, Alertmanager provides advanced features:

Alert deduplication and grouping
Time-based routing and inhibition
More granular notification policies

To use Alertmanager:

Deploy Alertmanager alongside Prometheus.
In Grafana, set your data source to use Alertmanager as the alert endpoint.
Define alerting rules in Prometheus configuration files (YAML), not in Grafana panels.

This approach is recommended for teams managing hundreds of alerts across multiple Grafana instances.

6. Monitor Alerting System Health

Your alerting system must be reliable. If Grafana itself fails, you wont receive alerts. To prevent this:

Monitor Grafanas own metrics (e.g., grafana_api_request_total, grafana_alerting_evaluations_total).
Set an alert for Grafana is down using an external uptime monitor (e.g., UptimeRobot, Pingdom).
Ensure Grafana is deployed with high availability run multiple replicas behind a load balancer.
Back up alert rules and dashboards using Grafanas provisioning system or Git.

7. Document and Review Alerts Regularly

Alerts decay over time. A rule that made sense last year may no longer be relevant. Establish a quarterly alert review process:

Identify alerts that never fired.
Remove or archive stale rules.
Update runbooks and ownership labels.
Validate thresholds against current performance baselines.

Treat alerting as code store alert rules in version control (e.g., Git) and manage them through CI/CD pipelines.

Tools and Resources

Core Tools

Grafana The central platform for visualization and alerting. Download at grafana.com/download.
Prometheus The most popular metrics collection and alerting engine. Ideal for integration with Grafana. prometheus.io/download
Alertmanager Advanced alert routing and suppression for Prometheus. GitHub
Loki Log aggregation system that integrates with Grafana for log-based alerting. grafana.com/oss/loki
Node Exporter Exports machine-level metrics (CPU, memory, disk, network). Essential for infrastructure monitoring. GitHub

Notification Integrations

Slack Use incoming webhooks for real-time team alerts.
PagerDuty Enterprise-grade incident management with escalation policies.
Microsoft Teams Use webhook connectors for Teams channel alerts.
Email SMTP integration via Gmail, Outlook, or self-hosted mail servers.
Twilio Send SMS alerts using API keys and phone number templates.
Opsgenie Robust alert routing with on-call scheduling.

Template and Example Libraries

Grafana Dashboards Import pre-built dashboards from grafana.com/dashboards (search for alerting or monitoring).
Prometheus Alert Rules Use community alerting rules from Prometheus GitHub.
Grafana Provisioning Automate alert creation using YAML config files. Docs: grafana.com/docs/provisioning

Learning Resources

Grafana Documentation: Alerting grafana.com/docs/alerting
YouTube: Grafana Alerting Tutorials Search for Grafana alerting setup for video walkthroughs.
Books Monitoring with Prometheus by Brian Brazil (OReilly) covers alerting in depth.
Community Join the Grafana Slack community or Reddits r/Grafana for peer support.

Real Examples

Example 1: High HTTP Error Rate Alert

Scenario: A web application is experiencing a surge in 5xx errors, indicating backend failures.

Query:

rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05

Condition: Evaluate every 1 minute, for 5 minutes.

Annotations:

description: HTTP error rate exceeded 5% for 5 consecutive minutes.
runbook: https://wiki.example.com/runbook/http-5xx

Labels:

severity: critical
team: backend

Notification Channel: Slack (

critical-alerts) and PagerDuty.

Outcome: When the alert fires, the backend team receives a detailed message with a direct link to troubleshooting steps. The alert resolves automatically once the error rate drops below 5% for 5 minutes.

Example 2: Disk Usage Alert on Kubernetes Nodes

Scenario: Kubernetes nodes are running out of disk space, causing pod evictions.

Query:

100 - (node_filesystem_avail_bytes{mountpoint="/"} * 100 / node_filesystem_size_bytes{mountpoint="/"})

Condition: Value > 85, evaluated every 2 minutes, for 10 minutes.

Annotations:

description: Disk usage on {{ $labels.instance }} has exceeded 85%.
runbook: https://wiki.example.com/runbook/disk-full-k8s

Labels:

severity: high
team: infrastructure
node: {{ $labels.instance }}

Notification Channel: Slack (

infra-alerts) and email to DevOps team.

Outcome: The alert triggers only after sustained high usage, avoiding false positives from temporary file writes. The message includes the exact node name, allowing rapid remediation.

Example 3: Application Latency Spike

Scenario: End-user experience is degrading due to increased API response times.

Query:

avg_over_time(http_request_duration_seconds{job="api-service"}[5m]) > 1.5

Condition: Evaluate every 1 minute, for 3 minutes.

Annotations:

description: Average API latency exceeded 1.5s for 3 minutes.
runbook: https://wiki.example.com/runbook/api-latency

Labels:

severity: high
team: api

Notification Channel: Slack (

api-alerts) and Microsoft Teams.

Outcome: The frontend team is alerted before users report slowdowns. The alert includes a link to a dashboard showing latency trends across regions, enabling faster diagnosis.

Example 4: Log-Based Alert Using Loki

Scenario: A microservice is logging repeated connection refused errors.

Query:

count_over_time({job="auth-service"} |= "connection refused" [5m]) > 10

Condition: Evaluate every 1 minute, for 2 minutes.

Annotations:

description: Auth service has logged 10+ connection refused errors in the last 5 minutes.
runbook: https://wiki.example.com/runbook/auth-connection-refused

Labels:

severity: critical
team: auth

Notification Channel: Slack (

critical-alerts) and PagerDuty.

Outcome: This log-based alert detects failures that may not be exposed in metrics such as downstream service unavailability and triggers immediate investigation.

FAQs

Can I send alerts without using Prometheus?

Yes. Grafana supports alerting on data from many sources, including InfluxDB, MySQL, PostgreSQL, CloudWatch, and more. As long as the data source supports querying time-series data and returns numeric values, you can create alerts. However, Prometheus remains the most reliable and feature-complete option for alerting due to its native integration and powerful query language (PromQL).

Why isnt my alert firing even though the metric exceeds the threshold?

Common causes include:

The For duration hasnt elapsed wait for the full window.
The data source is not returning data check the query in Explore mode.
The alert rule is disabled verify its toggled on in the Alerting ? Alert rules section.
Notification channel is misconfigured test the channel independently.

Can Grafana send alerts via SMS?

Yes, but not natively. Use a webhook integration with a service like Twilio, Vonage, or Plivo. Configure a custom webhook in Grafana that sends a POST request to the SMS providers API with the alert details.

How do I silence alerts during deployments?

Use Grafanas Silences feature. Define a silence that matches your alerts labels (e.g., alertname=High CPU, environment=staging) and set a start/end time matching your deployment window. Silences override notifications without deleting alerts.

Do alerts work when Grafana is offline?

No. If Grafana is down, it cannot evaluate queries or send notifications. For high availability, deploy Grafana in a clustered setup with a load balancer and persistent storage. For mission-critical systems, consider using Prometheus Alertmanager with redundant Grafana instances.

Can I schedule alerts to only trigger during business hours?

Grafana does not natively support time-based alert scheduling. However, you can simulate this by modifying your query to include a time filter. For example:

rate(http_requests_total[5m]) > 100 and hour() >= 9 and hour()

Alternatively, use external tools like cron jobs or alert managers that support time-based routing.

How do I prevent duplicate alerts for the same issue?

Use alert grouping and deduplication. If using Prometheus Alertmanager, configure grouping by labels like alertname and instance. In Grafanas built-in alerting, ensure you use consistent labels and avoid creating multiple identical alerts for the same metric.

Is it possible to auto-resolve alerts?

Yes. Grafana automatically sends a resolved notification when the condition returns to OK. Ensure your notification channel supports resolved alerts (Slack, email, and PagerDuty do). You can also customize the resolved message using templates like {{ .Status }} and {{ .EndsAt }}.

Conclusion

Sending alerts with Grafana is not just a technical task its a strategic practice that transforms monitoring from reactive observation to proactive resilience. By following the steps outlined in this guide, youve learned how to connect data sources, define intelligent alert rules, configure reliable notification channels, and apply industry best practices to reduce noise and improve response efficiency.

Effective alerting doesnt mean sending more alerts it means sending the right alerts to the right people at the right time. With properly tuned thresholds, meaningful annotations, and well-documented runbooks, your team can respond to incidents with confidence and speed.

As your infrastructure grows, continue refining your alerting strategy. Regularly review, test, and evolve your rules. Integrate alerting into your CI/CD pipeline. Treat alerts as code. And always prioritize clarity over volume.

With Grafana as your central nervous system, youre no longer just watching metrics youre safeguarding your services, your users, and your business. Start small, iterate often, and build an alerting system that works as hard as you do.

alex

How to Send Alerts With Grafana

How to Send Alerts With Grafana

Step-by-Step Guide

Prerequisites

Step 1: Connect a Data Source

Step 2: Create a Dashboard with a Time-Series Panel

Step 3: Define an Alert Rule

Step 4: Configure Notification Channels

alerts).

Step 5: Assign Notification Channels to Alerts

Step 6: Test the Alert

Step 7: Manage Alert States and Suppression

Best Practices

1. Prioritize Alert Severity and Actionability

2. Avoid Alert Fatigue with Thresholds and For Durations

3. Use Labels and Annotations for Context

4. Test Alert Logic with Realistic Scenarios

5. Centralize Alert Management with Alertmanager (Optional)

6. Monitor Alerting System Health

7. Document and Review Alerts Regularly

Tools and Resources

Core Tools

Notification Integrations

Template and Example Libraries

Learning Resources

Real Examples

Example 1: High HTTP Error Rate Alert

critical-alerts) and PagerDuty.

Example 2: Disk Usage Alert on Kubernetes Nodes

infra-alerts) and email to DevOps team.

Example 3: Application Latency Spike

api-alerts) and Microsoft Teams.

Example 4: Log-Based Alert Using Loki

critical-alerts) and PagerDuty.

FAQs

Can I send alerts without using Prometheus?

Why isnt my alert firing even though the metric exceeds the threshold?

Can Grafana send alerts via SMS?

How do I silence alerts during deployments?

Do alerts work when Grafana is offline?

Can I schedule alerts to only trigger during business hours?

How do I prevent duplicate alerts for the same issue?

Is it possible to auto-resolve alerts?

Conclusion

Related Posts

Popular Posts

Recommended Posts

Popular Tags