Prometheus Rules
Meanwhile the metrics are making their way to prometheus, prometheus runs process continuously on the described time window of series, evaluating them on the basics the predefined set of rules (defined in rule files) later perform actions like generating alerts accordingly.
Prometheus Configuration
rule_files:
- rule_file_path1
- rule_file_path2Prometheus rule file configuration
The rule file configuration component includes groups, group have have multiple rules with their own definitions such as expression for evaluation, alert name label to identify the alert and summary for alert description.
Rule file configuration
groups:
- name: nodeDownAlert
rules:
- alert: NodeDownAlert
expr: probe_success{job="t1"} == 0
for: 1m
labels:
ruleForJob: t1
annotations:
summary: Node Down Alert
description: Node {{ $labels.instance }} is down
Prometheus Alerts
The rule files are evaluated and when they satisfy the condition the alerts are generated however these alerts have to make their way to communication channels, also some control is also required for that purpose Alert manager is used.
Prometheus configuration
# Alert
alerting:
alertmanagers:
- static_configs:
- targets: ["alertmanager:9093"]Prometheus alert manager configuration
Alert manager configuration contains routes which checks for the matching labels of the alert, with respect to that it redirects to respective communication channel
Alertmanager configuration
route:
receiver: slack # default receiver
group_by: ["alertname"]
group_wait: 10s
group_interval: 10s
repeat_interval: 1m # aggregate alerts for 1 minute including silencing
routes:
- match:
ruleForJob: t1
receiver: slack
receivers:
- name: slack
slack_configs:
- api_url: $SLACK_WEBHOOK_URL
channel: "#new-channel"
send_resolved: true # will send an alert when the target is back up
text: >
[{{ .Status }}] {{ .CommonLabels.alertname }} alert
Target: {{ .CommonLabels.instance }}
Summary: {{ .CommonAnnotations.summary }}
Description: {{ .CommonAnnotations.description }}