Current revision updated by nfarley8 on
Originally created by nfarley8 on

Responsible Party

Nick Farley (21-08-2024)

Quick Info

  • Server: istrian.cc.gatech.edu
  • User: prometheus (sudo -u prometheus /bin/bash)
  • Working Directory: /home/prometheus/monitoring
  • Ports: 9090 (open on fw.noc and locally (firewalld))
  • GitHub Repository

Hot Updates

2024-11-13 - Moved monitoring infra over to istrian and updated the documentation accordingly.
2024-09-05 - Updated prometheus to use basic HTTP authentication in order to appease the vulnerability scanner. GitHub.

Background

dashboard.cc is a Grafana installation that we use to quickly glance at what’s going on with our infrastructure. The web team has a corner of Grafana dedicated to monitoring some basic website performance metrics:

Architecture

This diagram shows the basic architecture of how website monitoring works. Blackbox queries the webservers, Prometheus scrapes those metrics, and Grafana scrapes prometheus to display our data.

Prometheus and Blackbox are running as a docker compose stack on istrian.cc.gatech.edu. Configuration files are loaded from bind volume mounts, and data is stored in an attached volume. This may change in the future, as my preference is to use bind mounts over volumes.

The docker compose stack is owned by the prometheus user, so you’ll need to log in as them before making any changes or running docker commands:

sudo -u prometheus /bin/bash

The docker compose stack and configuration files are located at /home/prometheus/monitoring.

The Compose File

Please refer to the compose file reference. This is the file that defines the containers that run the monitoring infrastructure.
The code below may be out of date, refer to the GitHub repository for the latest version.

services:
  prometheus:
    container_name: prometheus
    image: prom/prometheus
    restart: unless-stopped
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
    ports:
      - "9090:9090"
    volumes:
      - "./prometheus/config/prometheus.yml:/etc/prometheus/prometheus.yml"
      - "./prometheus/config/targets.json:/etc/prometheus/targets.json"
      - "prom-data:/prometheus"
  blackbox-exporter:
    container_name: blackbox
    privileged: true
    image: prom/blackbox-exporter
    restart: unless-stopped
    command: "--config.file=/config/blackbox.yml"
    ports:
      - "127.0.0.1:9115:9115"
    volumes:
      - "./blackbox/blackbox.yml:/config/blackbox.yml"
volumes:
  prom-data:

Some notes about what’s going on here:

  • Both containers have restart: unless-stopped set, this means the containers will attempt to restart if they are stopped for any reason (e.g. a badly formatted configuration file)
  • The "127.0.0.1:9115:9115" port bind on blackbox prevents that port from being accessed from outside of the default network created by the compose stack.
  • The configuration files for both containers is explicitly mounted to the correct location on the containers.

Adding new monitors

IMPORTANT: Prometheus and Blackbox use yaml for their configuration files. Invalid yaml files will prevent the containers from starting.

If you’re just looking to monitor a website or application, read on. If you want to monitor something else, skip over to non-http monitors.

Adding a new website is fairly straightforward. We have a default http module named http_2xx already that you can use. In prometheus/config/prometheus.yml. Copy the yml below and customize it using the steps below:

scrape_configs:
  - job_name: "example-http"
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://example.com/
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox:9115
  1. Add a new entry under the scrape_configs key
  2. Make sure the job_name key is unique. This will help you group your data on grafana.
  3. Add any websites you want to check under the targets key.

Refer to prometheus/config/prometheus.yml if you need help, or take a look at the blackbox exporter repo.

Run docker compose restart as the prometheus user from /home/prometheus/monitoring to reload your configuration.

I Don’t Wanna Monitor a Website

Blackbox can monitor multiple protocols: HTTP, TCP, DNS, ICMP, and GRPC. Adding a new protocol requires setting up a new module in the blackbox configuration file at blackbox/blackbox.yml. Here’s what the default http_2xx probe looks like:

modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      follow_redirects: true
      valid_status_codes:
        - 200
        - 201
        - 301
        - 302
      tls_config:
        insecure_skip_verify: false
      preferred_ip_protocol: "ip4" # defaults to "ip6"
      ip_protocol_fallback: false # no fallback to "ip6"

The name of your module (the first level under modules) is how you’ll call it from prometheus.yml. In this example above, our module is named http_2xx, using the http prober and all the configuration options for that prober under the http key.

Adding a new module, for instance, an ICMP ping, would look like this:

modules:
  onePingOnly:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: ip4

And to add your check to prometheus:

scrape_configs:
  - job_name: "example-icmp"
    metrics_path: /probe
    params:
      module: [onePingOnly]
    static_configs:
      - targets:
          - https://example.com/
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox:9115

Run docker compose restart as the prometheus user from /home/prometheus/monitoring to reload your configuration.

Troubleshooting Tips

  • Not sure if prometheus is running:
    • Check the web dashboard at http://istrian.cc.gatech.edu:9090
    • If that page doesn’t load, check the status of the docker containers:
      cd /home/prometheus/monitoring && docker compose ps
    • Check docker container logs:
      cd /home/prometheus/monitoring && docker compose logs
  • Not sure if Blackbox is running:
    • Check the targets page in the Prometheus web dashboard. Entries are grouped by job_name from the prometheus config
  • Need to add a new user, or reset a user password:
    • Generate a new password hash with: htpasswd -nBC 10 "" | tr -d ':\n'
      Update (or add) the relevant line in prometheus/config/web.yml
      Bring the docker compose stack down and then back up: docker compose down && docker compose up -d (docker compose restart is not sufficient)

TODO

  • Document running this locally as well
  • Manage code through github
  • Automatically deploy new commits after passing yaml validation
Identifier Categories
Specific categories