There are
several ways to monitor a kubernetes cluster, some free, some paid, and some specific to a vendor's cluster implementation. If you're looking for the easy button, look at purchasing a solution; there are many out there, however, for this article, we'll be deploying something slightly more confusing but free using Prometheus as a monitoring solution.
Let's start with a few concepts:
- Prometheus pulls data through a process called a scrape. Scraping is a handy approach as you don't need agents pushing data everywhere but can limit your scalability
- Prometheus uses metric endpoints which are configured using jobs. We'll be installing a couple of end points to look at specific pieces of the infrastructure, but if you want your applications to be included, they'll need to support prometheus and have specific tags setup in their deployment file
- There's lots of documentation about Prometheus' ability to self monitor and automatically pick up new end points; this is awesome
There are many different endpoints you can choose and things can become confusing very quickly, so hopefully this guide will give you a base implementation to expand on as you see fit for your environment. To do this there will be three base components we'll be setting up:
- prometheus - the collector and repository of metric data; obviously
- kube-state-metrics - deep kubernetes metrics such as pod and node utilization. This is kind of like a more detailed version of metrics-server which is itself a replacement for Heapster. It's confusing, I know.
- node_exporter - collected detailed node metrics. There will be one endpoint per node in the cluster
Namespace And Role Setup
I like the idea of keeping my monitoring pieces in its own namespace so we'll be creating a
monitoring namespace and creating some roles for the different components to use. I've included that namespace creation during the prometheus cluster role setup, if you're doing things differently, make sure to take that into account. They're long, sorry, and suitable for github, but I wanted to make this as easy to follow as possible:
$ cat clusterRole-prometheus.yaml
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
And our cluster role and service account setup for kube-state-metrics.
$ cat clusterRole-kube-state.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
rules:
- apiGroups: [""]
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
- ingresses
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- daemonsets
- deployments
- replicasets
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
- apiGroups: ["policy"]
resources:
- poddisruptionbudgets
verbs: ["list", "watch"]
- apiGroups: ["certificates.k8s.io"]
resources:
- certificatesigningrequests
verbs: ["list", "watch"]
- apiGroups: ["storage.k8s.io"]
resources:
- storageclasses
verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: monitoring
Prometheus Configuration
The next thing we'll need is a configuration map for Prometheus itself. I've elected to collect data every minute, there are examples where people are collecting every 5 seconds, pick what makes sense to you. The scrape jobs are largely specific to the type of end point in use. If you're within the same cluster, the certificates and bearer tokens will automatically be pulled into the appropriate containers but if not you'll need to reference them directly which is out of scope for this article.
$ cat prometheus-config-map.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-server-conf
labels:
name: prometheus-server-conf
namespace: monitoring
data:
prometheus.yml: |-
global:
scrape_interval: 1m
evaluation_interval: 1m
scrape_timeout: 10s
rule_files:
- /etc/prometheus/prometheus.rules
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secretes/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'kubernetes-nodes-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- replacement: kubernetes.default.svc:443
target_label: __address__
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
Prometheus Deployment
Once the configuration file has been setup it's time to deploy Prometheus itself. This configuration is using a deployment, you could easily convert this to a stateful set if you like.
Please note the persistent volume created here belongs to the monitoring namespace, so if you clean that namespace, for example to reload the environment, you will wipe the old data. Also, keep in mind the storage class in use, I'm using vsphere-ssd from my
original blog post which might not be suitable for your environment. I'm also using a NodePort as it's always available and doesn't require additional network components but the inbound port will change every time you deploy so that might not be ideal long term.
$ cat prometheus-server.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data
namespace: monitoring
labels:
app: prometheus
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "vsphere-ssd"
resources:
requests:
storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-server
namespace: monitoring
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-server
template:
metadata:
labels:
app: prometheus-server
spec:
securityContext:
fsGroup: 65534
containers:
- name: prometheus
image: prom/prometheus:v2.10.0
volumeMounts:
- name: prometheus-config-volume
mountPath: /etc/prometheus/prometheus.yml
subPath: prometheus.yml
- name: data
mountPath: /prometheus
ports:
- containerPort: 9090
volumes:
- name: prometheus-config-volume
configMap:
name: prometheus-server-conf
- name: data
persistentVolumeClaim:
claimName: prometheus-data
serviceAccountName: prometheus
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-service
namespace: monitoring
spec:
selector:
app: prometheus-server
ports:
- name: promui
protocol: TCP
port: 9090
targetPort: 9090
type: NodePort
End Point Deployment
Our last two configuration files are to set up a deployment and internal service for kube-state-metrics and a daemon set and internal service for node-exporter. Because this is a daemon set, it'll get a copy on every node automatically when one is added. Self managed monitoring!
$ cat prometheus-kube-state.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: kube-state-metrics
name: kube-state-metrics
namespace: monitoring
spec:
selector:
matchLabels:
k8s-app: kube-state-metrics
replicas: 1
template:
metadata:
labels:
k8s-app: kube-state-metrics
spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics:v1.6.0
ports:
- name: http-metrics
containerPort: 8080
- name: telemetry
containerPort: 8081
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: kube-state-metrics
namespace: monitoring
labels:
k8s-app: kube-state-metrics
annotations:
prometheus.io/scrape: 'true'
spec:
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
protocol: TCP
- name: telemetry
port: 8081
targetPort: telemetry
protocol: TCP
selector:
k8s-app: kube-state-metrics
$ cat prometheus-node-exporter.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
labels:
k8s-app: node-exporter
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v0.18.1
spec:
selector:
matchLabels:
k8s-app: node-exporter
version: v0.18.1
updateStrategy:
type: OnDelete
template:
metadata:
labels:
k8s-app: node-exporter
version: v0.18.1
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
containers:
- name: prometheus-node-exporter
image: prom/node-exporter:v0.18.1
imagePullPolicy: "IfNotPresent"
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
ports:
- name: metrics
containerPort: 9100
hostPort: 9100
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
resources:
limits:
cpu: 10m
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
hostNetwork: true
hostPID: true
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
---
apiVersion: v1
kind: Service
metadata:
name: node-exporter
namespace: monitoring
annotations:
prometheus.io/scrape: "true"
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "NodeExporter"
spec:
clusterIP: None
ports:
- name: metrics
port: 9100
protocol: TCP
targetPort: 9100
selector:
k8s-app: node-exporter
Deployment and Cleanup Scripts
There are a lot of yaml files here and things get annoying when you want to deploy, test, clean, repeat, so here are two simple scripts that will deploy all of these files and then clean everything up again if you need them:
$ cat deploy.sh
kubectl create -f clusterRole-prometheus.yaml
kubectl create -f prometheus-config-map.yaml
kubectl create -f prometheus-server.yaml
kubectl create -f prometheus-node-exporter.yaml
kubectl create -f clusterRole-kube-state.yaml
kubectl create -f prometheus-kube-state.yaml
The bulk of the cleanup happens when you delete the namespace, but remember, this will also delete the persistent volume hosting the prometheus data.
$ cat cleanup.sh
kubectl delete namespace monitoring
kubectl delete clusterrolebinding prometheus
kubectl delete clusterrole prometheus
kubectl delete clusterrolebinding kube-state-metrics
kubectl delete clusterrole kube-state-metrics
Accessing Prometheus
You can check the pods and get your access point using these commands. In my cluster, I've got two worker nodes so I've got two node-exporters.
$ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
kube-state-metrics-699fdf75f8-cqq5t 1/1 Running 0 18h
node-exporter-88297 1/1 Running 0 18h
node-exporter-wb2lk 1/1 Running 0 18h
prometheus-server-6f9d9d86d4-m8x4f 1/1 Running 0 18h
And the all important services are here.
$ kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-state-metrics ClusterIP 10.110.207.159 <none> 8080/TCP,8081/TCP 18h
node-exporter ClusterIP None <none> 9100/TCP 18h
prometheus-service NodePort 10.101.114.89 <none> 9090:30044/TCP 18h
You can see our NodePort service, which means I can hit either node with the port 30044 and be redirected to my pod on port 9090. Let's try that in a browser:
http://k8s-n2.itlab.domain.com:30044/graph, where you should be presented with the Prometheus dashboard.
If you click on Status > Targets you should see everything Prometheus is scraping and you hover of the Labels you will see a lot of information including job="job_name". This can be particularly useful to tie back to your config map, especially when some things show as down.
If you have been following my
logging article, there are annotations in there for prometheus, which are now helpful, as any metrics fluentbit provides are automatically added. Targets should look something like this with all endpoints in a state of up.
Grafana Integration
The last step in this guide is where the real work begins. If you don't have a Grafana instance up and running, I'd suggest setting one up on a separate linux box. There are some excellent guides with essentially one command to run:
https://grafana.com/docs/installation/rpm/.
After that's done, add a data source of type Prometheus with the web URL you used to access the dashboard above. In my case, http://k8s-n2.itlab.domain.com:30044, and you can start creating dashboards. There are also some prebuilt ones that have been publicly hosted which you can find on Grafana's website,
https://grafana.com/dashboards and add them by simply placing the dashboard ID into the grafana import as documented by Grafana:
https://grafana.com/docs/reference/export_import/.
Notes
This is just scratching the surface of monitoring but hopefully it gives you enough of a framework to build from. Alert manager is a key piece yet to be discussed, long term storage, and of course dashboards, lots and lots of dashboards. Some articles I found particularly helpful: