Blog

Multi-Cloud Kubernetes Monitoring with Thanos

Multi-Cloud Kubernetes Monitoring

by Sagar Patil • 1st July 2022

tl;dr

  • How to implement Thanos for Prometheus metrics using cloud-specific storage options.
  • Showcase multi-cloud monitoring on a single Grafana using Prometheus and Thanos.
  • Thanos offers a set of components that can be composed into a highly available Prometheus setup with long-term storage capabilities.
  • Store historical metric data in object storage cost-effectively while retaining fast query latencies.

Prerequisites

  1. Running Amazon GKE, EKS, and AKS cluster (Kubernetes 1.13 or above)
  2. Prometheus or Prometheus Operator Helm Chart installed (v2.2.1+).
  3. Helm 3.x
  4. All Thanos components should be installed in the same namespace as Prometheus on the kubernetes cluster.
  5. Manifests for Thanos Querier and Store Deployment are available in the git repo mentioned below:
git clone -b release-0.12 https://github.com/thanos-io/kube-thanos.git

Installation

Google Kubernetes Engine (GKE)

Configure the GCS bucket and Service Account

Create a GCS bucket named prometheus-long-term-thanos

Now create a service account with Storage Admin and Object Admin permission. Download the key file as JSON credentials

Configure objectStorageConfig

Create a file thanos-storage-config.yaml with the following data

type: GCS
config:
  bucket: "prometheus-long-term-thanos"
  service_account:
    {
      "type": "service_account",
      "project_id": "project",
      "private_key_id": "abcdefghijklmnopqrstuvwxyz12345678906666",
      "private_key": "- - - BEGIN PRIVATE KEY - - - END PRIVATE KEY - - -",
      "client_email": "project@thanos.iam.gserviceaccount.com",
      "client_id": "123456789012345678901",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://oauth2.googleapis.com/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/thanos%40gitpods.iam.gserviceaccount.com"    }  

Create a K8s secret with the above config file

kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos-storage-config.yaml

Deploy the Prometheus and Thanos components

Fetch the default configuration values of prometheus-operator chart by running the command helm show values stable/prometheus-operator > values_default.yaml

Add the following data to the prometheusSpec section. This is added so that all the metrics have their cluster identification.

externalLabels:    
  cluster: aks-1

Add the Thanos Sidecar configuration after Thanos with the command {} in values_default.yaml

thanos:
      baseImage: quay.io/thanos/thanos
      version: v0.12.2
      objectStorageConfig:
        key: thanos.yaml
        name: thanos-objstore-config

Install Thanos Sidecar with Prometheus POD

helm install prometheus stable/prometheus-operator -f values_default.yaml -n monitoring

Check the status of Prometheus POD and Thanos Sidecar with the command

kubectl get pods -n monitoring -l app=prometheus

Deploy Thanos Querier

Clone the repo https://github.com/thanos-io/kube-thanos.git

Add metric store configuration as thanos-query-deployment.yaml (kube-thanos/manifests/thanos-query-deployment.yaml)under spec.spec.containers args query section

--store=thanos-store.monitoring.svc.cluster.local:10901
--store=prometheus-operated.monitoring.svc.cluster.local:10901

Make sure we change the namespace to monitoring in all the files in the below command. Apply the Thanos Query component deployment, service, and serviceMonitor kubernetes manifests to deploy kubernetes resources:

kubectl apply -f thanos-query-deployment.yaml -f
thanos-query-service.yaml -f
thanos-query-serviceMonitor.yaml

Deploy Thanos Store

Make the following changes in the Thanos Store configuration files

Change the spec.template.spec.containers.env in thanos-store-statefulSet.yaml (kube-thanos/manifests/thanos-store-statefulSet.yaml) to:

env:
     - name: OBJSTORE_CONFIG
       valueFrom:
        secretKeyRef:
          key: thanos.yaml
          name: thanos-objstore-config

Make sure we change the namespace to monitoring in all the files in the below command. Apply the Store statefulSet, service, and serviceMonitor manifests

kubectl apply -f thanos-store-statefulSet.yaml -f thanos-store-service.yaml -f thanos-store-serviceMonitor.yaml

Deploy Thanos Compactor (Optional)

Make the following changes in the Thanos Store configuration files

Change the spec.template.spec.containers.env in thanos-compact-statefulSet.yaml (kube-thanos/examples/all/manifests/thanos-compact-statefulSet.yaml) to:

env:
     - name: OBJSTORE_CONFIG
       valueFrom:
        secretKeyRef:
          key: thanos.yaml
          name: thanos-objstore-config

Make sure we change the namespace to monitoring in all the files in the below command. Apply the Store statefulSet, service, and serviceMonitor manifests

kubectl apply -f thanos-compact-statefulSet.yaml -f thanos-compact-service.yaml -f thanos-compact-serviceMonitor.yaml

Check the status of all Thanos components

kubectl get all -n monitoring

Configure Thanos as Grafana data source

Add Thanos Querier service as a data source in Grafana UI, then start viewing metric data. This can be done by going to Grafana -> Configuration -> Data Sources -> Add data source

Multi-Cloud Kubernetes Monitoring

Now we would be able to view the metrics on Grafana. PFB the screenshots for the same.

Multi-Cloud Kubernetes Monitoring

Installation

Elastic Kubernetes Service (EKS)

Configure S3 bucket and IAM user with S3 access

Create an S3 bucket named prometheus-thanos

Now create an IAM user with the following S3 permission. Copy the access and secret key for the same:

  1. s3:ListBucket
  2. s3:GetObject
  3. s3:DeleteObject
  4. s3:PutObject

Configure objectStorageConfig

Create a file thanos-storage-config.yaml with the following data

type: s3
config:
  bucket: prometheus-thanos  #S3 bucket name
  endpoint: s3.ap-south-1.amazonaws.com #S3 Regional endpoint
  access_key: <access_key>
  secret_key: <secret_key>

Create a K8s secret with the above config file

kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos-storage-config.yaml

Now repeat all the steps from the GKE set up to deploy the Prometheus and Thanos components. No need to configure Grafana. Just make sure we change the below in the values_default.yaml

externalLabels:    
  cluster: eks-1

Installation

Managed Kubernetes Service (AKS)

Create an AKS Storage account and download its keys

Create an AKS Storage account named thanosdemo

az storage account create --name thanosdemo --resource-group <aks-rsg> --location southindia --sku Standard_LRS

Create a storage container in the above storage account

az storage container create --account-name thanosdemo --name thanos

Get the storage account keys

az storage account keys list -g <aks-rsg> -n thanosdemo
[
  {
    "creationTime": "2021-08-06T07:24:25.606004+00:00",
    "keyName": "key1",
    "permissions": "FULL",
    "value": "key"
  },
  {
    "creationTime": "2021-08-06T07:24:25.606004+00:00",
    "keyName": "key2",
    "permissions": "FULL",
    "value": "<key>"
  }
]

Configure objectStorageConfig

Create a file thanos-storage-config.yaml with the following data

type: AZURE
config:
  storage_account: "thanosdemo"
  storage_account_key: "<key>"
  container: "thanos"

Create a K8s secret with the above config file

kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos-storage-config.yaml

Now repeat all the steps from the GKE set up to deploy the Prometheus and Thanos components. No need to configure Grafana. Just make sure we change the below in the values_default.yaml

externalLabels:    
  cluster: eks-1

Multi-Clustering Monitoring

Grafana in GKE to monitor all the clusters

Expose Thanos query on port 10901 of the AKS and EKS clusters to be accessible from the GKE cluster. We can patch the thanos-query service in the cluster using the following command

kubectl patch svc thanos-query -n monitoring -p '{"spec": {"type": "LoadBalancer"}}'

Now the thanos-query service will be available from the GKE cluster.

We now update the thanos-query deployment in the GKE cluster to query metrics from the thanos-query(or store endpoint) from the other two clusters. This can be done by adding the query endpoints as the arguments

- --store=<AWS_LB_DNS>:10901
- --store=<AKS_LB_IP>:10901

Now if we observe the data on the default Grafana dashboard we will be able to view metrics with label cluster for each one of the 3 clusters gke-1, eks-1, aks-1

We can also expose the Thanos-Query service on the GKE cluster and observe data being populated from all the other clusters also. PFB the scrrenshot for the same.

Multi-Cloud Kubernetes Monitoring

PFB the Grafana dashboard for all nodes from all the three clusters

Multi-Cloud Kubernetes Monitoring

Multi-Cloud Kubernetes Monitoring

Conclusion

Thanos to monitor multi-cloud Kubernetes setup

Conclusion

  • Custom configuration

    Thanos is a really complex system with a lot of moving parts, we did not deep dive into the specific custom configuration involved.

  • Central platform

    We explained how Thanos can be used as a central platform for the monitor our multi-cloud K8s infrastructure.