Responsible Party
Nick Farley (21-08-2024)
Quick Info
- Server:
istrian.cc.gatech.edu
- User:
prometheus
(sudo -u prometheus /bin/bash
) - Working Directory:
/home/prometheus/monitoring
- Ports:
9090
(open on fw.noc and locally (firewalld
)) - GitHub Repository
Hot Updates
2024-11-13 - Moved monitoring infra over to istrian
and updated the documentation accordingly.
2024-09-05 - Updated prometheus to use basic HTTP authentication in order to appease the vulnerability scanner. GitHub.
Background
dashboard.cc
is a Grafana installation that we use to quickly glance at what’s going on with our infrastructure. The web team has a corner of Grafana dedicated to monitoring some basic website performance metrics:
Architecture
This diagram shows the basic architecture of how website monitoring works. Blackbox queries the webservers, Prometheus scrapes those metrics, and Grafana scrapes prometheus to display our data.
Prometheus and Blackbox are running as a docker compose stack on istrian.cc.gatech.edu
. Configuration files are loaded from bind volume mounts, and data is stored in an attached volume. This may change in the future, as my preference is to use bind mounts over volumes.
The docker compose stack is owned by the prometheus
user, so you’ll need to log in as them before making any changes or running docker commands:
sudo -u prometheus /bin/bash
The docker compose stack and configuration files are located at /home/prometheus/monitoring
.
The Compose File
Please refer to the compose file reference. This is the file that defines the containers that run the monitoring infrastructure.
The code below may be out of date, refer to the GitHub repository for the latest version.
services:
prometheus:
container_name: prometheus
image: prom/prometheus
restart: unless-stopped
command:
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- "9090:9090"
volumes:
- "./prometheus/config/prometheus.yml:/etc/prometheus/prometheus.yml"
- "./prometheus/config/targets.json:/etc/prometheus/targets.json"
- "prom-data:/prometheus"
blackbox-exporter:
container_name: blackbox
privileged: true
image: prom/blackbox-exporter
restart: unless-stopped
command: "--config.file=/config/blackbox.yml"
ports:
- "127.0.0.1:9115:9115"
volumes:
- "./blackbox/blackbox.yml:/config/blackbox.yml"
volumes:
prom-data:
Some notes about what’s going on here:
- Both containers have
restart: unless-stopped
set, this means the containers will attempt to restart if they are stopped for any reason (e.g. a badly formatted configuration file) - The
"127.0.0.1:9115:9115"
port bind onblackbox
prevents that port from being accessed from outside of the default network created by the compose stack. - The configuration files for both containers is explicitly mounted to the correct location on the containers.
Adding new monitors
IMPORTANT: Prometheus and Blackbox use yaml for their configuration files. Invalid yaml files will prevent the containers from starting.
If you’re just looking to monitor a website or application, read on. If you want to monitor something else, skip over to non-http monitors.
Adding a new website is fairly straightforward. We have a default http module named http_2xx
already that you can use. In prometheus/config/prometheus.yml
. Copy the yml below and customize it using the steps below:
scrape_configs:
- job_name: "example-http"
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://example.com/
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox:9115
- Add a new entry under the
scrape_configs
key - Make sure the
job_name
key is unique. This will help you group your data on grafana. - Add any websites you want to check under the
targets
key.
Refer to prometheus/config/prometheus.yml
if you need help, or take a look at the blackbox exporter repo.
Run docker compose restart
as the prometheus
user from /home/prometheus/monitoring
to reload your configuration.
I Don’t Wanna Monitor a Website
Blackbox can monitor multiple protocols: HTTP, TCP, DNS, ICMP, and GRPC. Adding a new protocol requires setting up a new module in the blackbox configuration file at blackbox/blackbox.yml
. Here’s what the default http_2xx
probe looks like:
modules:
http_2xx:
prober: http
timeout: 5s
http:
follow_redirects: true
valid_status_codes:
- 200
- 201
- 301
- 302
tls_config:
insecure_skip_verify: false
preferred_ip_protocol: "ip4" # defaults to "ip6"
ip_protocol_fallback: false # no fallback to "ip6"
The name of your module (the first level under modules
) is how you’ll call it from prometheus.yml
. In this example above, our module is named http_2xx
, using the http
prober
and all the configuration options for that prober under the http
key.
Adding a new module, for instance, an ICMP ping, would look like this:
modules:
onePingOnly:
prober: icmp
timeout: 5s
icmp:
preferred_ip_protocol: ip4
And to add your check to prometheus:
scrape_configs:
- job_name: "example-icmp"
metrics_path: /probe
params:
module: [onePingOnly]
static_configs:
- targets:
- https://example.com/
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox:9115
Run docker compose restart
as the prometheus
user from /home/prometheus/monitoring
to reload your configuration.
Troubleshooting Tips
- Not sure if prometheus is running:
- Check the web dashboard at http://istrian.cc.gatech.edu:9090
- If that page doesn’t load, check the status of the docker containers:
cd /home/prometheus/monitoring && docker compose ps
- Check docker container logs:
cd /home/prometheus/monitoring && docker compose logs
- Not sure if Blackbox is running:
- Check the targets page in the Prometheus web dashboard. Entries are grouped by
job_name
from the prometheus config
- Check the targets page in the Prometheus web dashboard. Entries are grouped by
- Need to add a new user, or reset a user password:
- Generate a new password hash with:
htpasswd -nBC 10 "" | tr -d ':\n'
Update (or add) the relevant line inprometheus/config/web.yml
Bring the docker compose stack down and then back up:docker compose down && docker compose up -d
(docker compose restart
is not sufficient)
- Generate a new password hash with:
TODO
- Document running this locally as well
- Manage code through github
- Automatically deploy new commits after passing yaml validation