两条主流路线
| 栈 | 适合 | 备注 |
|---|---|---|
| Prometheus + Grafana | 云原生 / 容器化 / 现代项目 | 业界事实标准,CNCF Graduated |
| Zabbix | 传统运维 / 多种设备 | 老牌,邮件告警 / 模板丰富 |
| Netdata | 单机详细监控 | 装好即用,可视化漂亮 |
本文走 Prometheus + Grafana——和 ops-corp/19 一致。
架构
被监控机器(装 node_exporter) ← scrape ← Prometheus(中心)
↓
Grafana(可视化)
1. 装 node_exporter(在每台被监控机上)
📌 下面下载链接里的版本号会过时——动手前先到 node_exporter releases 看当前最新版本,替换
VER变量。
# 下载(把 VER 替换成 releases 页最新版本号,例如 1.8.2)
VER=1.8.2
cd /tmp
curl -LO https://github.com/prometheus/node_exporter/releases/download/v${VER}/node_exporter-${VER}.linux-amd64.tar.gz
tar -xzf node_exporter-${VER}.linux-amd64.tar.gz
sudo mv node_exporter-${VER}.linux-amd64/node_exporter /usr/local/bin/
# 系统用户
sudo useradd -rs /usr/sbin/nologin node_exporter
# systemd unit
sudo tee /etc/systemd/system/node_exporter.service > /dev/null <<'EOF'
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
验证:
curl http://localhost:9100/metrics | head
防火墙允许 9100:
sudo ufw allow from 监控服务器IP to any port 9100
2. 装 Prometheus(在监控服务器)
📌 同样:到 Prometheus releases 拿最新版本号替换
VER。
# 把 VER 替换成 releases 页最新版本号
VER=2.55.1
cd /tmp
curl -LO https://github.com/prometheus/prometheus/releases/download/v${VER}/prometheus-${VER}.linux-amd64.tar.gz
tar -xzf prometheus-${VER}.linux-amd64.tar.gz
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo cp prometheus-${VER}.linux-amd64/{prometheus,promtool} /usr/local/bin/
sudo cp -r prometheus-${VER}.linux-amd64/{consoles,console_libraries} /etc/prometheus/
sudo useradd -rs /usr/sbin/nologin prometheus
sudo chown -R prometheus: /etc/prometheus /var/lib/prometheus
/etc/prometheus/prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'nodes'
static_configs:
- targets:
- '192.168.1.100:9100'
- '192.168.1.101:9100'
- '192.168.1.102:9100'
# 也可以 scrape 应用本身的 /metrics
- job_name: 'myapp'
static_configs:
- targets: ['localhost:3000']
systemd unit /etc/systemd/system/prometheus.service:
[Unit]
Description=Prometheus
After=network.target
[Service]
User=prometheus
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
Restart=on-failure
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
访问 http://监控服务器IP:9090 看 Prometheus 自带 UI。
3. 装 Grafana
sudo apt install -y software-properties-common apt-transport-https
sudo wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key
echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install grafana
sudo systemctl enable --now grafana-server
访问 http://监控服务器IP:3000,默认账号 admin / admin,首次会让改密码。
4. Grafana 接 Prometheus
进入 Grafana:
- 左侧菜单 → Configuration → Data sources → Add data source
- 选 Prometheus
- URL 填
http://localhost:9090 - Save & Test
5. 导入 node_exporter 仪表板
社区有大量现成模板:
- 左侧菜单 → Dashboards → New → Import
- 输入 ID
1860(Node Exporter Full,最常用) - 选刚加的 Prometheus 数据源 → Import
立刻能看到所有机器的 CPU / 内存 / 磁盘 / 网络 / 负载图。
6. 加告警(Alertmanager 配合)
详细见 ops-corp/22-alerting。最小例:
/etc/prometheus/rules.yml:
groups:
- name: node_alerts
rules:
- alert: HighCPU
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "高 CPU 在 {{ $labels.instance }}"
description: "CPU 持续 5 分钟超过 80%"
- alert: DiskFull
expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 10
for: 5m
labels:
severity: critical
annotations:
summary: "磁盘快满了 在 {{ $labels.instance }}"
prometheus.yml 引用:
rule_files:
- "rules.yml"
Reload:
sudo systemctl reload prometheus
监控应用本身
应用层面要监控的话,应用要暴露 /metrics 端点(Prometheus 格式)。
Node:装 prom-client;
Python:prometheus_client;
Go:prometheus/client_golang。
详细在 ops-corp/19-prometheus-grafana。
替代方案
不想自己搭:
- Cloudflare 监控(免费)
- Better Stack / UptimeRobot(轻量探活)
- Grafana Cloud Free(免费档够个人项目)
- Datadog / New Relic(商业,贵但全)
下一篇:Docker 容器入门。