Problem Statement
I want the ability to drill in to my cluster starting with the cluster, to the nodes, to pods on that node, and information about those pods. In this case I wanted to end up with a table that looks something like this: We'll focus in on the persistent volume information for now as this is where things get complicated. For all the other metrics, you can key off the pod name, every query has that in the result and Grafana will find the common names and match them up. With a persistent volume, we need to make multiple queries following a path to get the final capacity numbers. There are a couple of ways to do this, but I followed this query logic. Max just tells prometheus to return parts of the query rather than every label.Query #1 - max(kube_pod_info{node="$node"}) by (pod)
-- This will return a list of pods for a given node
Query #2 - max(kube_pod_spec_volumes_persistentvolumeclaims_info) by (persistentvolumeclaim, pod, volume)
-- Will return a list of persistent volume claims for each pod
Query #3 - max(kube_persistentvolumeclaim_resource_requests_storage_bytes) by (persistentvolumeclaim)
-- Will return a list of persistent volume claims and their size as the value
I end up with a query path that looks like this: The problems is, if you query all three of these individually, Grafana won't know how to assemble your table as there is no linkage between all three. If you use pod name then the PVC capacity doesn't have a match. If you use persistent volume claim then the list of pods doesn't have a match.
Solution
Query #1 is fine. It's pulling a full list of pods, but somehow I need a combination of query #2 and query #3 where we take the labels of query #2 and merge it with the result of query #3. Without that combination there's no way to match the capacity all the way back up to the node and you get a very ugly table. The closest explanation I found was on stack overflow but it still took a bit to translate that to my requirements so I'm going to try and show this with the results from each query.- the value_metric - max(kube_persistentvolumeclaim_resource_requests_storage_bytes) by (persistentvolumeclaim)
- {persistentvolumeclaim="datadir-etcd-dns-0"} 1073741824
- the info_metric - max(kube_pod_spec_volumes_persistentvolumeclaims_info) by (persistentvolumeclaim, pod, volume)
- {persistentvolumeclaim="datadir-etcd-dns-0",pod="etcd-dns-0",volume="datadir"} 1
<value_metric> * on (<match_label>) group_left(<info_labels>) <info_metric>
And we end up with the following query:max(kube_persistentvolumeclaim_resource_requests_storage_bytes) by (persistentvolumeclaim) * on (persistentvolumeclaim) group_left(pod,volume) max(kube_pod_spec_volumes_persistentvolumeclaims_info{pod=~"$pod"}) by (persistentvolumeclaim, pod, volume)
Now when the query inspector is used we get an object containing 3 labels, pod, volume, and persistentvolumeclaim, and the value has a timestamp and our capacity information. This can now be paired up to the other queries containing a pod name because there's a common element