25.prometheus监控k8s集群节点

痛定思痛。 2022-10-11 13:43 466阅读 0赞

25.prometheus监控k8s集群

一、node-exporter

node_exporter抓取用于采集服务器节点的各种运行指标,比如 conntrack,cpu,diskstats,filesystem,loadavg,meminfo,netstat等
更多查看:https://github.com/prometheus/node\_exporter

1. Daemon Set部署node-exporter

拉取镜像docker pull prom/node-exporter:v1.1.2
vi node-exporter-dm.yaml

  1. apiVersion: apps/v1
  2. kind: DaemonSet
  3. metadata:
  4. name: node-exporter
  5. namespace: kube-mon
  6. labels:
  7. name: node-exporter
  8. spec:
  9. selector:
  10. matchLabels:
  11. name: node-exporter
  12. template:
  13. metadata:
  14. labels:
  15. name: node-exporter
  16. spec:
  17. hostPID: true # 使用主机PID namespace
  18. hostIPC: true # 使用主机IPC namespace
  19. hostNetwork: true # 使用主机net namespace
  20. containers:
  21. - name: node-exporter
  22. image: harbor.hzwod.com/k8s/prom/node-exporter:v1.1.2
  23. ports:
  24. - containerPort: 9100
  25. resources:
  26. requests:
  27. cpu: 150m
  28. # securityContext:
  29. # privileged: true
  30. args:
  31. - --path.rootfs
  32. - /host
  33. volumeMounts:
  34. - name: rootfs
  35. mountPath: /host
  36. tolerations:
  37. - key: "node-role.kubernetes.io/master"
  38. operator: "Exists"
  39. effect: "NoSchedule"
  40. volumes:
  41. - name: rootfs
  42. hostPath:
  43. path: /
  • hostPID: truehostIPC: truehostNetwork: true使node-export容器和主机共享PID、IPC、NET命名空间,以能使用主机的命令等资源
  • 注意,因和主机共享了net namespace ,则containerPort: 9100会直接暴露到主机的9001端口,该端口将作为metrics的服务入口
  • 挂载主机的/目录到容器/host目录,指定参数--path.rootfs=/host,使容器能找到并通过主机的这些文件获取主机的信息,如/proc/stat能获取cpu信息、/proc/meminfo能获取内存信息
  • tolerations 为pod添加容忍,允许该pod能运行在master节点上,因为我们希望master节点也能被监控,若有其他污点node再同理处理

kubectl apply -f node-exporter-dm.yaml 异常
在这里插入图片描述
查看 kube-apiserver -h找到这条说明
在这里插入图片描述
给kube-apiserver添加该启动参数--allow-privileged=true允许容器请求特权模式
或去掉上面的securityContext.privileged: true这个配置(TODO有什么影响暂时还不知)

检查metrics
curl http://172.10.10.100:9100/metrics
我们能看到能多指标信息

此时每个节点都有一个metrics接口,我们可以在prometheus上为每个node都配置上监控,但是若我们增加了一个node是不是就需要修改一次prometheus配置,有没有简单的方式能自动发现node呢?接下来看一看prometheus的服务发现

2. 服务发现

在 Kubernetes 下,Promethues 通过与 Kubernetes API 集成,目前主要支持5中服务发现模式,分别是:Node、Service、Pod、Endpoints、Ingress。

a. node发现

添加prometheus config

  1. - job_name: 'kubernetes-nodes'
  2. kubernetes_sd_configs:
  3. - role: node
  • kubernetes_sd_configs是prometheus提供的Kubernetes API服务发现配置
  • role可以是node、service、pod、endpoints、ingress,不同的role支持不同的meta labels
    更多信息可以查看官方文档:kubernetes_sd_config

除了kubernetes_sd_config prometheus还有还有很多其他选项prometheus configuration

reload prometheus后查看targets,发现自动发现生效了,但是接口都400了
在这里插入图片描述

b. 使用relabel_config调整服务发现的Endpoint

我们发现自动发现node后,prometheus自动寻找的端口是10250,而且还不通,这是为什么呢
10250端口实际上是旧版本kubelet提供的只读数据统一接口,现在版本的kubelet(此文版本:v1.17.16)已经修改为10255
而我们希望此处自动发现node的监听端口是我们node-export提供的9100端口(即使要使用kubelet自带的metrics也要修改成10255端口,下文配置cAdvisor时会用到)

kubelet启动后自动开启10255端口,可以通过curl http://[nodeIP]:10255/metrics查看监控信息

我们也可以通过relabel_configs来介入修改此处的Endpoint的端口或其他信息
修改prometheus.yaml 的kubernetes-nodes job配置

  1. - job_name: 'kubernetes-nodes'
  2. kubernetes_sd_configs:
  3. - role: node
  4. relabel_configs:
  5. - action: replace # 替换动作
  6. source_labels: [__address__] # 数组,指定多个label串联被regex匹配
  7. target_label: __address__ # 替换的目标label
  8. regex: '(.*):10250' # 正则匹配source_labels指定的labels串联值
  9. replacement: '${
  10. 1}:9100' # 为目标label替换后的值
  • action: replace 动作为替换
  • __address__
  • replacement: '${1}:9100' ${1}为引用regex正则表达式的第一个匹配组
    更多信息查看relabel_configs

官网关于__address__的一段描述
The __address__ label is set to the <host>:<port> address of the target. After relabeling, the instance label is set to the value of __address__ by default if it was not set during relabeling. The __scheme__ and __metrics_path__ labels are set to the scheme and metrics path of the target respectively. The __param_<name> label is set to the value of the first passed URL parameter called <name>

再添加 labelmap 添加kubernetes node的label作为prometheus的Labels,便于后续监控数据的筛选

  1. - action: labelmap
  2. regex: __meta_kubernetes_node_label_(.*)

更新prometheus.yaml并reload后,查看prometheus
在这里插入图片描述

c. 完整的prometheus.yaml

我们看一下完整的prometheus configmap(prometheus.yam使用configmap方式储存在etcd中)
prometheus-cm.yaml

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: prometheus-config
  5. namespace: kube-mon
  6. data:
  7. prometheus.yml: |
  8. global:
  9. scrape_interval: 15s
  10. scrape_timeout: 15s
  11. scrape_configs:
  12. - job_name: 'prometheus'
  13. static_configs:
  14. - targets: ['localhost:9090']
  15. - job_name: 'coredns'
  16. static_configs:
  17. - targets: ['kube-dns.kube-system:9153']
  18. - job_name: 'traefik'
  19. static_configs:
  20. - targets: ['traefiktcp.default:8180']
  21. - job_name: 'kubernetes-nodes'
  22. kubernetes_sd_configs:
  23. - role: node
  24. relabel_configs:
  25. - action: replace # 替换动作
  26. source_labels: [__address__] # 数组,指定多个label串联被regex匹配
  27. target_label: __address__ # 替换的目标label
  28. regex: '(.*):10250' # 正则匹配source_labels指定的labels串联值
  29. replacement: '${1}:9100' # 为目标label替换后的值
  30. - action: labelmap
  31. regex: __meta_kubernetes_node_label_(.*)
3. 配置grafana展示节点监控信息

前面我们已经安装好grafana且配置好了prometheus数据源,我们现在配置grafana模板监控展示nodeexport信息
下载模板:https://grafana.com/api/dashboards/8919/revisions/24/download
在这里插入图片描述

二、kube-state-metrics + cAdvisor

1. 配置prometheus监控cAdvisor

cAdvisor作为kubelet内置的一部分程序可以直接使用

  1. - job_name: 'k8s-cadvisor'
  2. metrics_path: /metrics/cadvisor
  3. kubernetes_sd_configs:
  4. - role: node
  5. relabel_configs:
  6. - source_labels: [__address__]
  7. regex: '(.*):10250'
  8. replacement: '${1}:10255'
  9. target_label: __address__
  10. action: replace
  11. - action: labelmap
  12. regex: __meta_kubernetes_node_label_(.+)
  13. metric_relabel_configs:
  14. - source_labels: [instance]
  15. separator: ;
  16. regex: (.+)
  17. target_label: node
  18. replacement: $1
  19. action: replace
  20. - source_labels: [pod_name]
  21. separator: ;
  22. regex: (.+)
  23. target_label: pod
  24. replacement: $1
  25. action: replace
  26. - source_labels: [container_name]
  27. separator: ;
  28. regex: (.+)
  29. target_label: container
  30. replacement: $1
  31. action: replace

在这里插入图片描述

2. 部署kube-state-metrics

https://github.com/kubernetes/kube-state-metrics/tree/master/examples/standard

本节部署kube-state-metrics的namespace:kube-mon
kube-state-metrics版本为v1.9.8

  • 下载镜像
    docker pull quay.mirrors.ustc.edu.cn/coreos/kube-state-metrics:v1.9.8
  • cluster-role-binding.yaml

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
    labels:

    1. app.kubernetes.io/name: kube-state-metrics
    2. app.kubernetes.io/version: 1.9.8

    name: kube-state-metrics
    roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: kube-state-metrics
    subjects:

    • kind: ServiceAccount
      name: kube-state-metrics
      namespace: kube-mon

    apiVersion: v1
    kind: ServiceAccount
    metadata:
    labels:

    1. app.kubernetes.io/name: kube-state-metrics
    2. app.kubernetes.io/version: 1.9.8

    name: kube-state-metrics
    namespace: kube-mon

  • cluster-role.yaml

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
    labels:

    1. app.kubernetes.io/name: kube-state-metrics
    2. app.kubernetes.io/version: 1.9.8

    name: kube-state-metrics
    rules:

    • apiGroups:
      • “”
        resources:
      • configmaps
      • secrets
      • nodes
      • pods
      • services
      • resourcequotas
      • replicationcontrollers
      • limitranges
      • persistentvolumeclaims
      • persistentvolumes
      • namespaces
      • endpoints
        verbs:
      • list
      • watch
    • apiGroups:
      • apps
        resources:
      • statefulsets
      • daemonsets
      • deployments
      • replicasets
        verbs:
      • list
      • watch
    • apiGroups:
      • batch
        resources:
      • cronjobs
      • jobs
        verbs:
      • list
      • watch
    • apiGroups:
      • autoscaling
        resources:
      • horizontalpodautoscalers
        verbs:
      • list
      • watch
    • apiGroups:
      • authentication.k8s.io
        resources:
      • tokenreviews
        verbs:
      • create
    • apiGroups:
      • authorization.k8s.io
        resources:
      • subjectaccessreviews
        verbs:
      • create
    • apiGroups:
      • policy
        resources:
      • poddisruptionbudgets
        verbs:
      • list
      • watch
    • apiGroups:
      • certificates.k8s.io
        resources:
      • certificatesigningrequests
        verbs:
      • list
      • watch
    • apiGroups:
      • storage.k8s.io
        resources:
      • storageclasses
      • volumeattachments
        verbs:
      • list
      • watch
    • apiGroups:
      • admissionregistration.k8s.io
        resources:
      • mutatingwebhookconfigurations
      • validatingwebhookconfigurations
        verbs:
      • list
      • watch
    • apiGroups:
      • networking.k8s.io
        resources:
      • networkpolicies
      • ingresses
        verbs:
      • list
      • watch
    • apiGroups:
      • coordination.k8s.io
        resources:
      • leases
        verbs:
      • list
      • watch
  • deployment.yaml

    apiVersion: apps/v1
    kind: Deployment
    metadata:
    labels:

    1. app.kubernetes.io/name: kube-state-metrics
    2. app.kubernetes.io/version: 1.9.8

    name: kube-state-metrics
    namespace: kube-mon
    spec:
    replicas: 1
    selector:

    1. matchLabels:
    2. app.kubernetes.io/name: kube-state-metrics

    template:

    1. metadata:
    2. labels:
    3. app.kubernetes.io/name: kube-state-metrics
    4. app.kubernetes.io/version: 1.9.8
    5. spec:
    6. containers:
    7. - image: harbor.hzwod.com/k8s/kube-state-metrics:v1.9.8
    8. livenessProbe:
    9. httpGet:
    10. path: /healthz
    11. port: 8080
    12. initialDelaySeconds: 5
    13. timeoutSeconds: 5
    14. name: kube-state-metrics
    15. ports:
    16. - containerPort: 8080
    17. name: http-metrics
    18. - containerPort: 8081
    19. name: telemetry
    20. readinessProbe:
    21. httpGet:
    22. path: /
    23. port: 8081
    24. initialDelaySeconds: 5
    25. timeoutSeconds: 5
    26. securityContext:
    27. runAsUser: 65534
    28. nodeSelector:
    29. kubernetes.io/os: linux
    30. serviceAccountName: kube-state-metrics
  • service.yaml

    apiVersion: v1
    kind: Service
    metadata:
    annotations:

    1. prometheus.io/scraped: "true"

    labels:

    1. app.kubernetes.io/name: kube-state-metrics
    2. app.kubernetes.io/version: 1.9.8

    name: kube-state-metrics
    namespace: kube-mon
    spec:
    clusterIP: None
    ports:

    • name: http-metrics
      port: 8080
      targetPort: http-metrics
    • name: telemetry
      port: 8081
      targetPort: telemetry
      selector:
      app.kubernetes.io/name: kube-state-metrics

kubectl apply -f . 应用这些资源启动kube-state-metrics容器及服务

3. 配置prometheus获取kube-state-metrics监控信息

prometheus.yaml 添加入如下job

  1. - job_name: kube-state-metrics
  2. kubernetes_sd_configs:
  3. - role: endpoints
  4. namespaces:
  5. names:
  6. - kube-mon
  7. relabel_configs:
  8. - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
  9. regex: kube-state-metrics
  10. replacement: $1
  11. action: keep
  12. - action: labelmap
  13. regex: __meta_kubernetes_service_label_(.+)
  14. - source_labels: [__meta_kubernetes_namespace]
  15. action: replace
  16. target_label: k8s_namespace
  17. - source_labels: [__meta_kubernetes_service_name]
  18. action: replace
  19. target_label: k8s_sname
  • endpoints自动发现service
  • keep 只监控label为app.kubernetes.io/name: kube-state-metrics的service

修改配置,reload prometheus后查看
在这里插入图片描述

4. 配置grafana模板展示监控信息

该模板需cadvisor和kube-state-metrics两提供的信息,因此上文完成了prometheus对这两个metrics的信息获取

  • 下载模板
    https://grafana.com/grafana/dashboards/13105
  • 效果
    在这里插入图片描述

发表评论

表情:
评论列表 (有 0 条评论,466人围观)

还没有评论,来说两句吧...

相关阅读

    相关 树莓派 k8s 监控 Prometheus

    树莓派是一款基于 ARM 的微型电脑主板,尺寸仅有信用卡大小,体积和火柴盒相差无几,用几台树莓派 4 组建的 k8s 集群也是一套完整的 k8s 环境,能够满足大部分使用使用场