Postgres-Operator with Metrics

25 Apr 2022 • on english kubernetes

When running PostgreSQL in Kubernetes, operators become quickly a topic. Operators are a concept of application specific Kubernetes controller. Or to put it easier: Operators are programs that configure and manage other software.

I use the “Zalando postgres-operator”, a Kubernetes operator that manages PostgreSQL clusters on my Kubernetes setup. One major headache is that this operator deploys your clusters, but doesn’t take care of the monitoring of these clusters. So while the PostgreSQL cluster itself might be up and running, working as expected, the performance might be terrible or there are other problems taking place within the database that are not visible.

For monitoring there is another operator, the prometheus-operator, a Kubernetes controller that manages Prometheus instances, their configuration and their integration with tools like Alertmanager. It provides a Kubernetes-native way to configure scraping by using selectors.

When all of this sounds to much for you, relax, read a bit more about these concepts before diving in deeper.

Before you start

It’s assumed that you already have the prometheus-operator and a Grafana instance installed. A popular setup, which I also use, is the kube-prometheus-stack

It’s also assumed that you are familiar with the concepts of selectors, pods, deployments, services and basics of CRDs.

And that you have used helm before.

Installing the Zalando postgres-operator

To install the Zalando postgres-operator, we’ll utilise their helm chart, following the official installation guide.

In order to install the operator using helm, we want to provide a values.yaml, that provides some simple modifications to the default configuration of the operator itself:

# values.yaml
---
configGeneral:
  sidecars:
    - name: "exporter"
      image: "quay.io/prometheuscommunity/postgres-exporter:latest"
      ports:
        - name: exporter
          containerPort: 9187
          protocol: TCP
      resources:
        limits:
          cpu: 500m
          memory: 256M
        requests:
          cpu: 100m
          memory: 200M
      env:
      - name: "DATA_SOURCE_URI"
        value: "$(POD_NAME)/postgres?sslmode=require"
      - name: "DATA_SOURCE_USER"
        value: "$(POSTGRES_USER)"
      - name: "DATA_SOURCE_PASS"
        value: "$(POSTGRES_PASSWORD)"
      - name: "PG_EXPORTER_AUTO_DISCOVER_DATABASES"
        value: "true"

This values.yaml file instructs the helm chart, to deploy a configuration for the operator, that will create all PostgreSQL clusters with a sidecar container. This sidecar container runs the postgresql-exporter form the Prometheus community and is configured to connect the local installed cluster and collect various metrics about the individual PostgreSQL-instance.

To install the operator using these values, you can use the official install instructions with the addition of the --values parameter and the path to the values file you just created:

# add repo for postgres-operator
helm repo add postgres-operator-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator

# install the postgres-operator
helm install --namespace postgres-operator --create-namespace --values values.yaml postgres-operator postgres-operator-charts/postgres-operator

Note: You’ll require cluster-admin permissions for this, in order to install the CRDs.

Deploying a PostgreSQL cluster

With the operator installed, it’s time to create a PostgreSQL cluster. This is done using a Custom Resource (CR) called postgresql. In this example, we will use the minimal cluster from the operator repository, but you can make it as complex as you feel comfortable with. A reference can be found the the operator documentation.

# postgresql.yaml
---
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: example-minimal-cluster
spec:
  teamId: "example" # needs to be identitical with the name prefix (example-)
  volume:
    size: 1Gi
  numberOfInstances: 2
  users:
    example:  # database owner
    - superuser
    - createdb
  databases:
    exampledb: example  # dbname: owner
  postgresql:
    version: "14"

Storing this YAML as postgresql.yaml, you can use kubectl:

kubectl create namespace example
kubectl apply -n example -f postgresql.yaml
kubectl wait pod/example-minimal-cluster-0 -n example

Setting up the Prometheus monitoring

As a last step, it’s time to collect the metrics from all PostgreSQL clusters deployed by the operator. This is done using the PodMonitor CR. Again a simple YAML manifest file is needed, let’s call it podmonitor.yaml:

# podmonitor.yaml
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: postgresql
spec:
  selector:
    matchLabels:
      application: spilo  # (1)
  namespaceSelector:
    any: true  # (2)
  podMetricsEndpoints:
    - port: exporter # (3)
      interval: 15s
      scrapeTimeout: 10s
    - targetPort: 8008 # (4)
      interval: 15s
      scrapeTimeout: 10s
  podTargetLabels: # (5)
    - spilo-role
    - cluster-name
    - team

To understand this CR a bit better, lets explain the various parts of the spec. This CR is deployed in the operator namespace, and will instruct the Prometheus operator to automatically scrape all PostgreSQL clusters created by the operator across the Kubernetes cluster to be monitored without requiring additional instructions.

The selector for this PodMonitor targets all spilo applications. spilo is the image that the postgres-operator uses and contains PostgreSQL, splio and everything needed to cluster the setup. It’s also the default set label by the operator, this should find all cluster instances.
This namespaceSelector explicitly instructs the PodMonitor to search in all namespaces. Without a namespaceSelector, the PodMonitor would only look in the same namespace. You an also provide a list of namespaces if you prefer to be more selective.
This port name, is from the sidecar container explicitly configured in the postgres-operator configuration above, that is now deployed with every postgresql cluster.
This port is from Patroni and provides additional metrics regarding the cluster status, such as the current leader/replica situation and should help to debug potential replication problems.
podTargetLabels instructs Prometheus to collect the Kubernetes pod labels and add them to the metrics collected from the scraped exporters. This is useful to identify your different clusters in dashboards and general queries.

Wrapping up and further hints

With all this done, you have a monitored PostgreSQL cluster in your Kubernetes that just waits to be utilised for your next project. Maybe a Mastodon instance?

You can find some good dashboards on the official Grafana website, that should give you better insight into your PostgreSQL setup. But already you can find all the metrics under the pg_ and patroni_ prefix.