David's Notes Mostly tech oriented...

Using Steampipe to SQL Query Kubernetes

David Gamba, @gambaeng
2024-09-03
version 0.1, 2024-09-03 #kubernetes #sql #steampipe

Steampipe is a tool that allows you to use SQL to query multiple API endpoints.

Over time I have built several ad-hoc and not so ad-hoc tools that do something similar but unlike Steampipe they had a very limited scope and required me to rebuild the DB (SQLite) every time I wanted to re-hydrate the data.

This post covers the following:

A trip down some of my ad-hoc solutions.
Installing Steampipe and the Kubernetes plugin.
Some useful functions.
Some useful queries.

Ad-hoc Solution Basics

Lets say I need to get a list of pods and see in which nodes they are running. With kubectl there is no direct way to do this, you need to do two queries and then join the results.

I have seen people do it with jq or some other tool, but I normally default to SQLite (that is, until I found Steampipe).

So I would do something like this:

Define what fields I want to get from the pods and the nodes and make them top level as a TSV:

$ kubectl get pods -A -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.namespace}{"\t"}{.spec.nodeName}{"\n"}{end}' > pods.tsv

$ kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.kops\.k8s\.io/instancegroup}{"\t"}{.spec.taints}{"\t"}{.metadata.labels.beta\.kubernetes\.io/instance-type}{"\t"}{.metadata.labels.topology\.kubernetes\.io/zone}{"\t"}{.metadata.creationTimestamp}{"\n"}{end}' > nodes.tsv

Then create a simple schema based on those fields:

db.schema

CREATE TABLE IF NOT EXISTS "pods" (
	"Name" TEXT,
	"Namespace" TEXT,
	"NodeName" TEXT
);

CREATE TABLE IF NOT EXISTS "nodes" (
	"Name" TEXT,
	"InstanceGroup" TEXT,
	"Taints" TEXT,
	"InstanceType" TEXT,
	"Zone" TEXT,
	"CreationTimestamp" TEXT
);

.mode tabs
.import pods.tsv pods
.import nodes.tsv nodes
.mode table

As you can see, the schema file also imports the data into the tables.

Create the DB:

$ sqlite3 k8s.db < db.schema

Then run your query:

$ sqlite3 k8s.db

sqlite> SELECT pods.*,nodes.instancetype FROM pods LEFT JOIN nodes ON pods.nodename = nodes.name WHERE namespace = 'default';
+-------------+-----------+---------------------------------------------+--------------+
|    Name     | Namespace |                  NodeName                   | InstanceType |
+-------------+-----------+---------------------------------------------+--------------+
| app-a-kpns4 | default   | ip-xxx-yy-42-42.us-west-2.compute.internal  | m5.8xlarge   |
+-------------+-----------+---------------------------------------------+--------------+
| app-a-bz7rv | default   | ip-xxx-yy-39-148.us-west-2.compute.internal | m5.8xlarge   |
+-------------+-----------+---------------------------------------------+--------------+
| app-a-jr7rd | default   | ip-xxx-yy-45-31.us-west-2.compute.internal  | m5.8xlarge   |
+-------------+-----------+---------------------------------------------+--------------+

That is a very simple example, but you can see the amount of effort required in maintaining and preparing the data. Every time you need another field, or another label you need to update the schema and the data.

Steampipe Installation

Installation is described here: https://steampipe.io/downloads

And the Kubernetes plugin installation is described here: https://hub.steampipe.io/plugins/turbot/kubernetes

Finally, setup your config file ~/.steampipe/config/kubernetes.spc:

# Default connection that points to my current context
connection "k" {
  plugin         = "kubernetes"
  custom_resource_tables = ["*"]
  source_types   = ["deployed"]
}
# Specific connections to a single context
connection "k_dev" {
  plugin         = "kubernetes"
  config_path    = "~/.kube/config"
  config_context = "dev.example.com"
  custom_resource_tables = ["*"]
  source_types   = ["deployed"]
}
connection "k_prod" {
  plugin         = "kubernetes"
  config_path    = "~/.kube/config"
  config_context = "prod.example.com"
  custom_resource_tables = ["*"]
  source_types   = ["deployed"]
}

Important

The default connection k is loaded when the Steampipe service is started. If you change your context, you need to restart the Steampipe service.

After setting up your config you can run Steampipe from the CLI or start the service and connect to it via a PostgreSQL client.

Now you can run the same query as before, without any prep:

Run: steampipe query

Then enter your query:

SELECT
	pod.name,
	pod.namespace,
	node.name AS nodename,
	node.labels ->> 'beta.kubernetes.io/instance-type' AS instancetype
FROM
	k.kubernetes_pod AS pod
	LEFT JOIN k.kubernetes_node AS node ON pod.node_name = node.name
WHERE
	pod.namespace = 'default';

And just like that, without any prep, you get the same result as before. The only difference is that instead of having to surface the labels as a top level column, I had to access them as a JSON object.

If you need the cache to refresh, run:

SELECT FROM steampipe_internal.meta_cache('clear');

And as mentioned before, if you are changing contexts and are using the default connection, you need to restart the service.

CPU and Memory Functions

Currently (2024-09-03) Steampipe does not have built in functions to normalize CPU and Memory values so that one can do any math on them. I created a Github Issue to try and figure out how to add them.

I would like, for example, to see the total CPU and Memory used by a deployment, or the percentage of utilization a node has.

The CPU and Memory requests and limits are defined as Quantity and can have the following formats:

base 10 units: m | ”“ | k | M | G | T | P | E
base 2 units: Ki | Mi | Gi | Ti | Pi | Ei

Where m fractions get rounded up.

CPU units are easier to normalize, because at least in the clusters I work with we only have m (milicores) and ”“ (cores) units so I chose to normalize to m.

CREATE OR REPLACE FUNCTION cpu_m (cpu text)
	RETURNS bigint
	LANGUAGE plpgsql
	AS $$
DECLARE
    v bigint;
BEGIN
	IF cpu like '%m' THEN
		-- https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/
		-- m fractions get rounded up
		SELECT
			ceiling(trim(trailing 'm' from cpu)::numeric) INTO v;
	ELSE
		SELECT
			cpu::numeric * 1000 INTO v;
	END IF;
	RETURN v;
END;
$$;

And to validate it works, run:

SELECT
	cpu_m ('31750m') = 31750,
	cpu_m ('32') = 32000,
	cpu_m ('1.5') = 1500,
	cpu_m ('1.5m') = 2;

For memory, the units are a bit more complex, but I chose to normalize to bytes since requesting a fraction of a byte makes no sense. However, after testing it in my clusters I did find instances where developers had requested memory in m units. This is probably a mistake or just bad practice IMHO, so in the future I might add a Kyverno policy to prevent this.

CREATE OR REPLACE FUNCTION memory_bytes (memory text)
	RETURNS bigint
	LANGUAGE plpgsql
	AS $$
DECLARE
	v bigint;
BEGIN
	-- https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/
	-- base 10: m | "" | k | M | G | T | P | E
	-- base 2: Ki | Mi | Gi | Ti | Pi | Ei
	-- m fractions get rounded up
	CASE --
	WHEN memory LIKE '%m' THEN
		SELECT
			ceiling(trim(TRAILING 'm' FROM memory)::numeric / 1000) INTO v;
	WHEN memory LIKE '%k' THEN
		SELECT
			trim(TRAILING 'k' FROM memory)::numeric * 1000 INTO v;
	WHEN memory LIKE '%M' THEN
		SELECT
			trim(TRAILING 'M' FROM memory)::numeric * 1000_000 INTO v;
	WHEN memory LIKE '%G' THEN
		SELECT
			trim(TRAILING 'G' FROM memory)::numeric * 1000_000_000 INTO v;
	WHEN memory LIKE '%T' THEN
		SELECT
			trim(TRAILING 'T' FROM memory)::numeric * 1000_000_000_000 INTO v;
	WHEN memory LIKE '%P' THEN
		SELECT
			trim(TRAILING 'P' FROM memory)::numeric * 1000_000_000_000_000 INTO v;
	WHEN memory LIKE '%E' THEN
		SELECT
			trim(TRAILING 'E' FROM memory)::numeric * 1000_000_000_000_000 INTO v;
	WHEN memory LIKE '%Ki' THEN
		SELECT
			trim(TRAILING 'Ki' FROM memory)::numeric * 1024 INTO v;
	WHEN memory LIKE '%Mi' THEN
		SELECT
			trim(TRAILING 'Mi' FROM memory)::numeric * 1024 * 1024 INTO v;
	WHEN memory LIKE '%Gi' THEN
		SELECT
			trim(TRAILING 'Gi' FROM memory)::numeric * 1024 * 1024 * 1024 INTO v;
	WHEN memory LIKE '%Ti' THEN
		SELECT
			trim(TRAILING 'Ti' FROM memory)::numeric * 1024 * 1024 * 1024 * 1024 INTO v;
	WHEN memory LIKE '%Pi' THEN
		SELECT
			trim(TRAILING 'Pi' FROM memory)::numeric * 1024 * 1024 * 1024 * 1024 * 1024 INTO v;
	WHEN memory LIKE '%Ei' THEN
		SELECT
			trim(TRAILING 'Ei' FROM memory)::numeric * 1024 * 1024 * 1024 * 1024 * 1024 * 1024 INTO v;
			--
			--
		ELSE
			SELECT
				memory::bigint INTO v;
	END CASE;
	RETURN v;
	END;
$$;

And to validate it works, run:

SELECT
	memory_bytes ('3m') = 1 as subm,
	memory_bytes ('3000m') = 3 as m,
	memory_bytes ('31750') = 31750 AS empty,
	memory_bytes ('32k') = 32000 AS k,
	memory_bytes ('32Ki') = 32768 AS Ki,
	memory_bytes ('10Mi') = 10485760 AS Mi,
	memory_bytes ('1.5Gi') = 1610612736 AS Gi,
	memory_bytes ('1.5Mi') = 1572864 as fraction_Mi;

Once the math is done, I want to be able to see the memory in something other than bytes so I created an approximation function to display the data:

CREATE OR REPLACE FUNCTION memory_aprox_quantity (bytes numeric)
	RETURNS text
	LANGUAGE plpgsql
	AS $$
DECLARE
	v text;
BEGIN
	-- https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/
	-- base 10: m | "" | k | M | G | T | P | E
	-- base 2: Ki | Mi | Gi | Ti | Pi | Ei
	CASE --
	WHEN bytes >= (1024::numeric * 1024 * 1024 * 1024 * 1024 * 1024) THEN
		SELECT
			concat(trim(TRAILING '.' FROM to_char(bytes / (1024::numeric * 1024 * 1024 * 1024 * 1024 * 1024), 'FM9999999999999999999.9')), 'Ei') INTO v;
	WHEN bytes >= (1024::numeric * 1024 * 1024 * 1024 * 1024) THEN
		SELECT
			concat(trim(TRAILING '.' FROM to_char(bytes / (1024::numeric * 1024 * 1024 * 1024 * 1024), 'FM9999999999999999999.9')), 'Pi') INTO v;
	WHEN bytes >= (1024::numeric * 1024 * 1024 * 1024) THEN
		SELECT
			concat(trim(TRAILING '.' FROM to_char(bytes / (1024::numeric * 1024 * 1024 * 1024), 'FM9999999999999999999.9')), 'Ti') INTO v;
	WHEN bytes >= (1024 * 1024 * 1024) THEN
		SELECT
			concat(trim(TRAILING '.' FROM to_char(bytes / (1024 * 1024 * 1024), 'FM9999999999999999999.9')), 'Gi') INTO v;
	WHEN bytes >= (1024 * 1024) THEN
		SELECT
			concat(trim(TRAILING '.' FROM to_char(bytes / (1024 * 1024), 'FM9999999999999999999.9')), 'Mi') INTO v;
	WHEN bytes >= (1024) THEN
		SELECT
			concat(trim(TRAILING '.' FROM to_char(bytes / 1024, 'FM9999999999999999999.9')), 'Ki') INTO v;
			--
			--
		ELSE
			SELECT
				bytes INTO v;
	END CASE;
	RETURN v;
	END;
$$;

This one is not exact as I am only showing one decimal place.

The Queries In Action

Get all deployments, with their pods (through a replicaset) and the nodes they are running on. Count how many of those are still running on m5.8xlarge nodes and how many m5.8xlarge nodes are running them.

To verify the outcome of rolling this deployment, check the anti affinity rules that are in place to check if the pods are being spread across the hosts and zones or if they can be scheduled on the same node.

SELECT
	d.name AS deployment_name,
	d.namespace,
	d.replicas,
	count(pod.name) AS pod_count,
	cpu_m (pcrequests.* ->> 'cpu') AS pod_cpu,
	memory_aprox_quantity (memory_bytes (pcrequests.* ->> 'memory')) AS pod_memory,
	count(node.name) AS node_count,
	jsonb_agg(pod.name) AS pod_name,
	json_agg(DISTINCT (coalesce(requiredPodAntiAffinity, '[]') || coalesce(preferredPodAntiAffinity, '[]'))) AS podAntiAffinity,
	jsonb_agg(node.name) AS node_name
FROM
	k.kubernetes_deployment AS d
	LEFT OUTER JOIN LATERAL jsonb_path_query(d.template, '$.spec.containers[*].resources.requests') pcrequests ON TRUE
	JOIN k.kubernetes_replicaset AS rs ON rs.namespace = d.namespace
		AND rs.owner_references -> 0 ->> 'uid' = d.uid
	JOIN k.kubernetes_pod AS pod ON pod.namespace = rs.namespace
		AND rs.uid = pod.owner_references -> 0 ->> 'uid'
	LEFT OUTER JOIN k.kubernetes_node AS node ON node.name = pod.node_name
	LEFT OUTER JOIN LATERAL jsonb_path_query(d.template, '$.spec.affinity.podAntiAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].topologyKey') requiredPodAntiAffinity ON TRUE
	LEFT OUTER JOIN LATERAL jsonb_path_query(d.template, '$.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.topologyKey') preferredPodAntiAffinity ON TRUE
WHERE
	node.labels ->> 'beta.kubernetes.io/instance-type' = 'm5.8xlarge'
GROUP BY
	d.name,
	d.namespace,
	d.replicas,
	rs.name,
	pod_cpu,
	pod_memory
ORDER BY
	d.replicas DESC,
	node_count DESC
LIMIT 1;

+-----------------+-------------+----------+-----------+---------+------------+------------+----------+------------------------------+-----------+
| deployment_name | namespace   | replicas | pod_count | pod_cpu | pod_memory | node_count | pod_name | podantiaffinity              | node_name |
+-----------------+-------------+----------+-----------+---------+------------+------------+----------+------------------------------+-----------+
| coredns         | kube-system | 89       | 78        | 100     | 70Mi       | 78         | ...      | [["kubernetes.io/hostname"]] | ...       |
+-----------------+-------------+----------+-----------+---------+------------+------------+----------+------------------------------+-----------+

Same as above but for statefulsets.

SELECT
	sts.name AS sts_name,
	sts.namespace,
	sts.replicas,
	count(pod.name) AS pod_count,
	cpu_m (pcrequests.* ->> 'cpu') AS pod_cpu,
	memory_aprox_quantity (memory_bytes (pcrequests.* ->> 'memory')) AS pod_memory,
	count(node.name) AS node_count,
	jsonb_agg(pod.name) AS pod_name,
	json_agg(DISTINCT (coalesce(requiredPodAntiAffinity, '[]') || coalesce(preferredPodAntiAffinity, '[]'))) AS podAntiAffinity,
	jsonb_agg(node.name) AS node_name
FROM
	k.kubernetes_stateful_set AS sts
	LEFT OUTER JOIN LATERAL jsonb_path_query(sts.template, '$.spec.containers[*].resources.requests') pcrequests ON TRUE
	JOIN k.kubernetes_pod AS pod ON pod.namespace = sts.namespace
		AND sts.uid = pod.owner_references -> 0 ->> 'uid'
	LEFT OUTER JOIN k.kubernetes_node AS node ON node.name = pod.node_name
	LEFT OUTER JOIN LATERAL jsonb_path_query(sts.template, '$.spec.affinity.podAntiAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].topologyKey') requiredPodAntiAffinity ON TRUE
	LEFT OUTER JOIN LATERAL jsonb_path_query(sts.template, '$.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.topologyKey') preferredPodAntiAffinity ON TRUE
WHERE
	node.labels ->> 'beta.kubernetes.io/instance-type' = 'm5.8xlarge'
GROUP BY
	sts.name,
	sts.namespace,
	sts.replicas,
	pod_cpu,
	pod_memory
ORDER BY
	sts.replicas DESC,
	node_count DESC
LIMIT 1;

SELECT all nodes and list the pods they contain as well as the percentage of CPU and Memory usage, WHERE their instance type is m5.8xlarge. Optionally limit the pods to a single namespace.

SELECT
	node.name,
	node.labels ->> 'beta.kubernetes.io/instance-type' AS instancetype,
	node.labels ->> 'kops.k8s.io/instancegroup' AS instancegroup,
	count(pod.name) AS pod_count,
	-- json_agg(pod.name) AS pods,
	cpu_m (node.allocatable ->> 'cpu') AS node_cpu,
	memory_aprox_quantity (memory_bytes (node.allocatable ->> 'memory')) AS node_memory,
	sum(cpu_m (pcrequests.* ->> 'cpu')) AS pod_cpu,
	to_char(sum(cpu_m (pcrequests.* ->> 'cpu')) * 100 / cpu_m (node.allocatable ->> 'cpu'), 'FM999.99') AS "pod_cpu_%",
	memory_aprox_quantity (sum(memory_bytes (pcrequests.* ->> 'memory'))) AS pod_memory,
	to_char(sum(memory_bytes (pcrequests.* ->> 'memory')) * 100 / memory_bytes (node.allocatable ->> 'memory'), 'FM999.99') AS "pod_memory_%"
FROM
	k.kubernetes_node AS node
	LEFT OUTER JOIN k.kubernetes_pod AS pod ON node.name = pod.node_name
	CROSS JOIN LATERAL jsonb_path_query(pod.containers, '$[*].resources.requests') pcrequests
WHERE
	--pod.namespace = 'default'
	--AND
	node.labels ->> 'beta.kubernetes.io/instance-type' = 'm5.8xlarge'
GROUP BY
	node.name,
	node.labels ->> 'beta.kubernetes.io/instance-type',
	node.labels ->> 'kops.k8s.io/instancegroup',
	node.allocatable ->> 'cpu',
	node.allocatable ->> 'memory'
ORDER BY
	instancetype,
	pod_count DESC
LIMIT 1;

+---------------------------------------------+--------------+------------------+-----------+----------+-------------+---------+-----------+------------+--------------+
| name                                        | instancetype | instancegroup    | pod_count | node_cpu | node_memory | pod_cpu | pod_cpu_% | pod_memory | pod_memory_% |
+---------------------------------------------+--------------+------------------+-----------+----------+-------------+---------+-----------+------------+--------------+
| ip-xx-yy-39-148.us-west-2.compute.internal  | m5.8xlarge   | nodes-us-west-2c | 25        | 31750    | 119Gi       | 31060   | 97.83     | 75.3Gi     | 63.21        |
+---------------------------------------------+--------------+------------------+-----------+----------+-------------+---------+-----------+------------+--------------+

Conclusion

Once you get the groove of Steampipe you start asking more and more questions in a pretty intuitive way. SQL has been my tool of choice to query AWS and Kubernetes for a long time and I finally found a tool that does most of the work for me.

Getting around accessing the JSON objects is the hardest part and something I really hope they improve upon in the future by exposing some well known fields as top level columns.