一、国密证书监控方案选型
由于国密SSL证书与标准X.509证书采用相同的证书结构与生命周期管理机制,标准X.509证书监控方案完全适用于国密证书。以下是四种主流监控方案对比:
监控方案 适用场景 核心指标 | 部署难度
x509-certificate-exporter 本地证书文件、K8s Secret、kubeconfig `x509_cert_not_after`、`x509_cert_expired`、`x509_cert_expires_in_seconds` | 低(Helm一键部署)
Blackbox Exporter 对外HTTPS服务端点的主动探测 `probe_ssl_earliest_cert_expiry` 中
cert-manager内置指标 K8s环境中cert-manager管理的证书 `certmanager_certificate_expiration_timestamp_seconds` 低(自动暴露)
自定义Exporter/脚本 特殊环境或完全自定义需求 自定义指标 高
说明:国密证书在指标名称中仍以 `x509`、`ssl` 等前缀标识,监控指标与采集方式与标准证书完全一致。
二、全链路监控架构
┌─────────────────────────────────────────────────────────────────────────┐
│ 监控采集层 │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────┐ │
│ │x509-cert-exporter│ │ Blackbox Exporter│ │ cert-manager metrics │ │
│ │(本地/K8s证书文件) │ │ (HTTPS端点探测) │ │ (K8s证书管理器内置) │ │
│ └────────┬─────────┘ └────────┬─────────┘ └───────────┬──────────────┘ │
│ │ │ │ │
│ └────────────────────┼────────────────────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │Prometheus │ ◄── scrape_interval: 60-300s │
│ └─────┬─────┘ │
└────────────────────────────────│─────────────────────────────────────────┘
│
┌──────▼──────┐
│Alertmanager│ ◄── 告警路由与分组
└──────┬──────┘
│
┌────────────────────────┼────────────────────────┐
▼ ▼ ▼
┌─────────┐ ┌───────────┐ ┌───────────┐
│ 邮件 │ │ 企业微信 │ │ Webhook │
└─────────┘ └───────────┘ └───────────┘
证书监控的核心本质是对"时间窗口"的管理——需要在证书过期前的关键时间段发出预警。
三、方案一:x509-certificate-exporter(推荐)
这是功能最全面的证书监控方案,支持PEM文件、K8s Secret和kubeconfig的监控。
3.1 安装部署
方式一:Helm部署(K8s环境)
bash
helm repo add enix https://charts.enix.io
helm repo update
helm install x509-exporter enix/x509-certificate-exporter \
--namespace monitoring \
--create-namespace \
--set secretsExporter.enabled=true \
--set hostPathsExporter.daemonSets[0].watchFiles[0]=/etc/ssl/certs \
--set hostPathsExporter.daemonSets[0].watchFiles[1]=/etc/pki/tls/certs
方式二:二进制部署(非K8s环境)**
bash
下载并安装
curl -LO https://github.com/enix/x509-certificate-exporter/releases/latest/download/x509-certificate-exporter-linux-amd64.tar.gz
tar xzf x509-certificate-exporter-linux-amd64.tar.gz
sudo mv x509-certificate-exporter /usr/local/bin/
启动,监控国密证书文件
x509-certificate-exporter \
--watch-file=/etc/ssl/certs/server.crt \
--watch-file=/etc/letsencrypt/live/*/fullchain.pem \
--listen-address=:9793
3.2 Prometheus抓取配置
yaml
# prometheus.yml
scrape_configs:
- job_name: 'x509-certificate-exporter'
static_configs:
- targets: ['localhost:9793']
scrape_interval: 60s
scrape_timeout: 30s
采集间隔说明:证书过期监控60-300秒的抓取间隔已完全够用,国密证书不是高频变更的数据,频繁抓取只会浪费资源。
四、方案二:Blackbox Exporter(对外服务探测)
适用于监控对外HTTPS服务端点使用的国密证书。
4.1 安装配置
yaml
# docker-compose.yml
version: '3.7'
services:
blackbox_exporter:
container_name: blackbox_exporter
image: prom/blackbox-exporter:master
volumes:
- ./config.yml:/etc/blackbox_exporter/config.yml
ports:
- 9115:9115
yaml
# blackbox_exporter/config.yml
modules:
http_2xx:
prober: http
timeout: 5s
http:
method: GET
tls_config:
insecure_skip_verify: true # 用于自签名国密证书
说明:`insecure_skip_verify: true` 参数可跳过证书验证,适用于自签名国密证书的监控。
4.2 Prometheus抓取配置
yaml
# prometheus.yml
scrape_configs:
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://gm.example.com/ # 国密证书站点
- https://api.gm.cn/
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
五、Prometheus告警规则配置
5.1 x509-certificate-exporter 告警规则
yaml
# prometheus-rules/certificate-alerts.yml
groups:
- name: certificate-expiry
interval: 30s
rules:
警告:证书即将过期(小于28天)
- alert: CertificateRenewal
expr: ((x509_cert_not_after - time()) / 86400) < 28
for: 15m
labels:
severity: warning
annotations:
summary: "国密证书「{{ $labels.subject_CN }}」需要续期"
description: "证书「{{ $labels.subject_CN }}」将在 {{ $value | humanizeDuration }} 后过期"
严重:证书即将过期(小于14天)
- alert: CertificateExpiration
expr: ((x509_cert_not_after - time()) / 86400) < 14
for: 15m
labels:
severity: critical
annotations:
summary: "国密证书「{{ $labels.subject_CN }}」即将过期"
description: "证书 {{ if $labels.secret_name }}在K8s Secret「{{ $labels.secret_namespace }}/{{ $labels.secret_name }}」中{{ else }}位于「{{ $labels.filepath }}」{{ end }},仅剩 {{ $value | humanizeDuration }}"
紧急:证书已过期
- alert: CertificateExpired
expr: x509_cert_expired == 1
for: 5m
labels:
severity: emergency
annotations:
summary: "国密证书「{{ $labels.subject_CN }}」已过期"
description: "证书已过期,服务将不可用,请立即处理"
5.2 Blackbox Exporter 告警规则
yaml
groups:
- name: blackbox-tls
rules:
- alert: SSLCertificateExpiringWarning
expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 30
for: 1h
labels:
severity: warning
annotations:
summary: "站点 {{ $labels.instance }} 国密证书30天后过期"
- alert: SSLCertificateExpiringCritical
expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 7
for: 30m
labels:
severity: critical
annotations:
summary: "站点 {{ $labels.instance }} 国密证书7天内过期"
- alert: SSLCertificateExpired
expr: (probe_ssl_earliest_cert_expiry - time()) <= 0
for: 5m
labels:
severity: emergency
annotations:
summary: "站点 {{ $labels.instance }} 国密证书已过期"
5.3 多阈值分级告警(综合版)
yaml
groups:
- name: certificate-multi-threshold
rules:
- alert: CertificateExpiring30Days
expr: |
(x509_cert_expires_in_seconds < 2592000 and x509_cert_expires_in_seconds > 0) or
((probe_ssl_earliest_cert_expiry - time()) < 2592000 and (probe_ssl_earliest_cert_expiry - time()) > 0)
for: 1h
labels: { severity: "warning" }
annotations: { summary: "国密证书将在30天内过期" }
- alert: CertificateExpiring14Days
expr: |
(x509_cert_expires_in_seconds < 1209600 and x509_cert_expires_in_seconds > 0) or
((probe_ssl_earliest_cert_expiry - time()) < 1209600 and (probe_ssl_earliest_cert_expiry - time()) > 0)
for: 1h
labels: { severity: "high" }
annotations: { summary: "国密证书将在14天内过期,需立即续期" }
- alert: CertificateExpiring7Days
expr: |
(x509_cert_expires_in_seconds < 604800 and x509_cert_expires_in_seconds > 0) or
((probe_ssl_earliest_cert_expiry - time()) < 604800 and (probe_ssl_earliest_cert_expiry - time()) > 0)
for: 30m
labels: { severity: "critical" }
annotations: { summary: "国密证书将在7天内过期,服务中断风险极高" }
- alert: CertificateExpired
expr: x509_cert_expired == 1 or (probe_ssl_earliest_cert_expiry - time()) <= 0
for: 5m
labels: { severity: "emergency" }
annotations: { summary: "国密证书已过期,服务不可用" }
六、Alertmanager告警路由配置
6.1 基础配置
yaml
# alertmanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.example.com:587'
smtp_from: 'alert@example.com'
route:
group_by: ['alertname', 'severity', 'instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'default-receiver'
routes:
- match:
severity: emergency
receiver: 'emergency-pager'
repeat_interval: 30m
continue: true
- match:
severity: critical
receiver: 'critical-email'
repeat_interval: 2h
- match:
severity: high
receiver: 'slack-alerts'
repeat_interval: 4h
- match:
severity: warning
receiver: 'warning-email'
repeat_interval: 12h
6.2 企业微信接收器配置
yaml
receivers:
- name: 'emergency-pager'
webhook_configs:
- url: 'https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=YOUR_KEY'
send_resolved: true
- name: 'slack-alerts'
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
channel: '#cert-alerts'
title: '🔔 国密证书告警'
text: |
*告警名称:* {{ .GroupLabels.alertname }}
*严重级别:* {{ .CommonLabels.severity }}
*证书标识:* {{ .CommonLabels.instance }}{{ .CommonLabels.subject_CN }}
*告警详情:* {{ .CommonAnnotations.description }}
color: 'danger'
七、Grafana可视化面板
7.1 导入x509-certificate-exporter专用面板
打开Grafana → Dashboards → Import
输入Dashboard ID:13922
选择Prometheus数据源,点击Import
7.2 自定义核心查询
监控需求 PromQL查询
剩余天数(天) `(x509_cert_not_after - time()) / 86400`
| 即将过期证书计数 | `count(x509_cert_expires_in_seconds < 2592000 and x509_cert_expires_in_seconds > 0)` |
| 已过期证书 | `x509_cert_expired == 1` |
| 多源统一视图 | `(probe_ssl_earliest_cert_expiry - time()) / 86400` |
7.3 阈值设置建议
面板建议设置颜色阈值:红色 < 7天,橙色 < 14天,黄色 < 30天,绿色 > 30天。
八、常见问题与解决方案
Q1:国密证书解析失败怎么办?
检查Exporter是否支持相关容器格式(如PEM)。国密证书通常以标准PEM格式存储,x509-certificate-exporter可直接解析,无需特殊配置。
Q2:自签名国密证书监控配置特殊吗?
在Blackbox Exporter配置中添加 `insecure_skip_verify: true` 即可跳过证书验证。
Q3:告警过于频繁怎么办?
合理利用Alertmanager的 `group_interval` 和 `repeat_interval` 参数控制频率,并在告警规则中使用 `for` 关键字避免短暂波动触发告警。
Q4:如何验证监控配置正确?
bash
检查Exporter指标
curl localhost:9793/metrics | grep x509_cert_not_after
查询Prometheus指标
curl 'http://localhost:9090/api/v1/query?query=x509_cert_not_after'
验证Alertmanager
amtool alert --alertmanager.url=http://localhost:9093
九、快速部署:Docker Compose一体化方案
yaml
docker-compose.yml
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
ports: ["9090:9090"]
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./rules:/etc/prometheus/rules
alertmanager:
image: prom/alertmanager:latest
ports: ["9093:9093"]
volumes:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
grafana:
image: grafana/grafana:latest
ports: ["3000:3000"]
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
blackbox-exporter:
image: prom/blackbox-exporter:latest
ports: ["9115:9115"]
volumes:
- ./blackbox.yml:/etc/blackbox_exporter/config.yml
x509-exporter:
image: enix/x509-certificate-exporter:latest
ports: ["9793:9793"]
volumes:
- /etc/ssl/certs:/certs:ro
command:
- --watch-dir=/certs
- --listen-address=:9793
启动命令:`docker compose up -d`
总结
本文提供了四套国密证书监控方案,推荐优先使用 x509-certificate-exporter 方案,配合Prometheus告警规则和Alertmanager多渠道通知,可构建完整的国密证书自动监控体系。核心要点:
1. 国密证书与X.509证书兼容,现有监控工具可直接使用
2. 建议设置30天、14天、7天分级阈值,实现递进式告警
3. Grafana Dashboard ID 13922 提供开箱即用的可视化面板
4. 生产环境建议60-300秒抓取间隔,避免资源浪费
如您需要针对特定场景(如K8s容器环境、非K8s物理机等)的更详细配置,请告知具体需求,我可以进一步提供针对性方案。