Skip to main content

Recording Rules

Updated Nov 20, 2022 ·

Overview

Recording rules allow you to precompute frequently used queries and store the results as new time series, which then improves query performance. They are useful for reducing query complexity and optimizing system performance.

  • Store the results of a complex query as a new time series.
  • Can reduce computational load during frequent queries.
  • Help improve performance and speed up dashboarding and alerting.

With Recording Rules, you don't have to evaluate the expressions on the fly since they are stored in the database.

Rules Configuration

Recording rules are stored in a rules.yml file, which is then referenced in the main prometheus.yml file.

groups:
- name: <group name 1>
rules:
- interval: <evaluation interval>
record: <rule name 1>
expr: <promql expression 1>
labels:
<label name>: <label value>
- record: <rule name 2>
expr: <promql expression 2>
labels:

- name: <group name 2>
rules:
# Additional rules for group 2, if any

Each file defines one or more rule groups under the groups key. If the evaluation interval is not defined, it will default to the value set in the Prometheus configuration.

  • record specifies the rule name, which is defined in the expr.
  • Use labels to add or remove labels before storing the results.
  • Rules are evaluated in the order in which they are declared.

Calling the Rules File

To reference the rules file in the Prometheus config file:

global:
scrape_interval: 15s
evaluation_interval: 15s
rules_files:
- rules.yml
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 15s
scrape_timeout: 5s
sample_limit: 1000
static_configs:
- targets: ['localhost:9090']

Alternatively, you can use globbing to specify multiple rules files:

rules_files:
- /etc/prometheus/rules/*.yml

Remember, after making changes to a rule file, you need to restart the Prometheus service:

sudo systemctl restart prometheus 
sudo systemctl status prometheus

Record Rule Naming

A good naming scheme allows for easy identification and management of rules. The recommended format is:

level:metric:operations

Where:

  • Level
    • Indicates the aggregation level of the metric based on its labels.
    • Always include the job label, along with any other relevant target labels.
  • Metric
    • Refers to the metric or time series name being measured.
  • Operations
    • Describes the functions and aggregators applied to the metric.
    • Examples: sum, avg, max.

Example Naming:

  • This represents the rate of a metric for the api-server job.

    job:api-server:rate 
  • This could represent the maximum value of a metric collected by the node-exporter instance.

    instance:node-exporter:max

Organizing Recording Rules by Job

In addition to a good naming convention, all the rules for a specific job should be contained in the same group to ensure organization and clarity.

In the sample rules.yml file below, all rules for the api-server job are placed within the same group, and similarly for the node-exporter job. This helps keep related rules organized.

groups:
- name: api-server
rules:
- interval: 1m
record: job:api-server:http_requests_rate_5m
expr: rate(http_requests_total[5m])
labels:
job: 'api-server'
- interval: 1m
record: job:api-server:http_requests_rate_1m
expr: rate(http_requests_total[1m])
labels:
job: 'api-server'
- interval: 1m
record: job:api-server:avg_response_time
expr: avg(http_request_duration_seconds) by (job)
labels:
job: 'api-server'

- name: node-exporter
rules:
- interval: 1m
record: instance:node-exporter:avg_memory_usage
expr: avg(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) by (instance)
labels:
job: 'node-exporter'
- interval: 1m
record: instance:node-exporter:cpu_usage
expr: avg(rate(node_cpu_seconds_total[5m])) by (instance)
labels:
job: 'node-exporter'

Example: Using Recording Rules

In the example below, we are using a recording rule to store the result of a rate calculation for HTTP requests over a 5-minute period.

groups:

- name: http_requests
rules:
- interval: 1m
record: http_requests_rate_5m
expr: rate(http_requests_total[5m])
labels:
job: 'api-server'

- interval: 1m
record: http_requests_rate_1m
expr: rate(http_requests_total[1m])
labels:
job: 'api-server'

- name: memory_usage
rules:
- interval: 1m
record: avg_memory_usage
expr: avg(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) by (instance)
labels:
job: 'node-exporter'

Explanation:

  • http_requests_rate_5m:

    • Calculates the rate of HTTP requests over a 5-minute window.
    • Stores the result as http_requests_rate_5m.
  • http_requests_rate_1m:

    • Similar to the previous rule but for a 1-minute rate.
  • avg_memory_usage:

    • Calculates the average memory usage across instances.
    • Shows the ratio of available memory to total memory.

These recording rules help optimize queries by precomputing, storing the results, and making it easier to retrieve them later.

Example: Recording Rules in Action

In this example, we're using recording rules that calculate network rates and average latency.

  1. Create the /etc/prometheus/api-rules.yaml rules file.

    groups:
    - name: node
    interval: 15s
    rules:
    - record: node_network_receive_bytes_rate
    expr: rate(node_network_receive_bytes_total{job="nodes"}[2m])
    - record: node_network_receive_bytes_rate_avg
    expr: avg by(instance) (node_network_receive_bytes_rate)
    - record: node_filesystem_free_percent
    expr: 100 * node_filesystem_free_bytes{job="nodes"} / node_filesystem_size_bytes{job="nodes"}
  2. Reference them in the /etc/prometheus/prometheus.yml file.

    rule_files:
    - "node-rules.yaml"
  3. Save the file and restart Prometheus service:

    systemctl restart prometheus  
    systemctl status prometheus
  4. To verify, login to the Prometheus console and go to Status > Rules.

  5. Create a second rule that will track the average latency for past 2 minutes:

    cat > /etc/prometheus/api-rules.yaml 

    Add the rules:

    groups:
    - name: api
    interval: 15s
    rules:
    - record: avg_latency_2m
    expr: rate(http_request_total_sum{job="api"}[2m]) / rate(http_request_total_count{job="api"}[2m])
  6. Specify them in the Prometheus configuration file.

    rule_files:
    - "node-rules.yaml"
    - "api-rules.yaml"
  7. Restart Prometheus service once again:

    systemctl restart prometheus  
    systemctl status prometheus
  8. Go back to the Prometheus console and verify that all the rule are showing OK.