Prometheus Rules

Meanwhile the metrics are making their way to prometheus, prometheus runs process continuously on the described time window of series, evaluating them on the basics the predefined set of rules (defined in rule files) later perform actions like generating alerts accordingly.

Prometheus Configuration
rule_files:
  - rule_file_path1
  - rule_file_path2

Prometheus rule file configuration

The rule file configuration component includes groups, group have have multiple rules with their own definitions such as expression for evaluation, alert name label to identify the alert and summary for alert description.

Rule file configuration
groups:
  - name: nodeDownAlert
    rules:
      - alert: NodeDownAlert
        expr: probe_success{job="t1"} == 0
        for: 1m
        labels:
          ruleForJob: t1
        annotations:
          summary: Node Down Alert
          description: Node {{ $labels.instance }} is down
 

Rule file config generator

Prometheus Alerts

The rule files are evaluated and when they satisfy the condition the alerts are generated however these alerts have to make their way to communication channels, also some control is also required for that purpose Alert manager is used.

Prometheus configuration
# Alert
alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanager:9093"]

Prometheus alert manager configuration

Alert manager configuration contains routes which checks for the matching labels of the alert, with respect to that it redirects to respective communication channel

Alertmanager configuration
route:
  receiver: slack # default receiver
  group_by: ["alertname"]
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1m # aggregate alerts for 1 minute including silencing
 
  routes:
    - match:
        ruleForJob: t1
      receiver: slack
 
receivers:
  - name: slack
    slack_configs:
      - api_url: $SLACK_WEBHOOK_URL
        channel: "#new-channel"
        send_resolved: true # will send an alert when the target is back up
        text: >
          [{{ .Status }}] {{ .CommonLabels.alertname }} alert
          Target: {{ .CommonLabels.instance }}
          Summary: {{ .CommonAnnotations.summary }}
          Description: {{ .CommonAnnotations.description }}