By browsing our website, you consent to our use of cookies and other tracking technologies. For more information, read our Privacy Policy.
Blog
Multi-Cloud Kubernetes Monitoring with Thanos
Multi-Cloud Kubernetes Monitoring
by Sagar Patil • 1st July 2022
tl;dr
- How to implement Thanos for Prometheus metrics using cloud-specific storage options.
- Showcase multi-cloud monitoring on a single Grafana using Prometheus and Thanos.
- Thanos offers a set of components that can be composed into a highly available Prometheus setup with long-term storage capabilities.
- Store historical metric data in object storage cost-effectively while retaining fast query latencies.
Prerequisites
- Running Amazon GKE, EKS, and AKS cluster (Kubernetes 1.13 or above)
- Prometheus or Prometheus Operator Helm Chart installed (v2.2.1+).
- Helm 3.x
- All Thanos components should be installed in the same namespace as Prometheus on the kubernetes cluster.
- Manifests for Thanos Querier and Store Deployment are available in the git repo mentioned below:
git clone -b release-0.12 https://github.com/thanos-io/kube-thanos.git
Installation
Google Kubernetes Engine (GKE)
Configure the GCS bucket and Service Account
Create a GCS bucket named prometheus-long-term-thanos
Now create a service account with Storage Admin and Object Admin permission. Download the key file as JSON credentials
Configure objectStorageConfig
Create a file thanos-storage-config.yaml with the following data
type: GCS
config:
bucket: "prometheus-long-term-thanos"
service_account:
{
"type": "service_account",
"project_id": "project",
"private_key_id": "abcdefghijklmnopqrstuvwxyz12345678906666",
"private_key": "- - - BEGIN PRIVATE KEY - - - END PRIVATE KEY - - -",
"client_email": "project@thanos.iam.gserviceaccount.com",
"client_id": "123456789012345678901",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/thanos%40gitpods.iam.gserviceaccount.com" }
Create a K8s secret with the above config file
kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos-storage-config.yaml
Deploy the Prometheus and Thanos components
Fetch the default configuration values of prometheus-operator chart by running the command helm show values stable/prometheus-operator > values_default.yaml
Add the following data to the prometheusSpec section. This is added so that all the metrics have their cluster identification.
externalLabels:
cluster: aks-1
Add the Thanos Sidecar configuration after Thanos with the command {} in values_default.yaml
thanos:
baseImage: quay.io/thanos/thanos
version: v0.12.2
objectStorageConfig:
key: thanos.yaml
name: thanos-objstore-config
Install Thanos Sidecar with Prometheus POD
helm install prometheus stable/prometheus-operator -f values_default.yaml -n monitoring
Check the status of Prometheus POD and Thanos Sidecar with the command
kubectl get pods -n monitoring -l app=prometheus
Deploy Thanos Querier
Clone the repo https://github.com/thanos-io/kube-thanos.git
Add metric store configuration as thanos-query-deployment.yaml (kube-thanos/manifests/thanos-query-deployment.yaml)under spec.spec.containers args query section
--store=thanos-store.monitoring.svc.cluster.local:10901
--store=prometheus-operated.monitoring.svc.cluster.local:10901
Make sure we change the namespace to monitoring in all the files in the below command. Apply the Thanos Query component deployment, service, and serviceMonitor kubernetes manifests to deploy kubernetes resources:
kubectl apply -f thanos-query-deployment.yaml -f
thanos-query-service.yaml -f
thanos-query-serviceMonitor.yaml
Deploy Thanos Store
Make the following changes in the Thanos Store configuration files
Change the spec.template.spec.containers.env in thanos-store-statefulSet.yaml (kube-thanos/manifests/thanos-store-statefulSet.yaml) to:
env:
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos.yaml
name: thanos-objstore-config
Make sure we change the namespace to monitoring in all the files in the below command. Apply the Store statefulSet, service, and serviceMonitor manifests
kubectl apply -f thanos-store-statefulSet.yaml -f thanos-store-service.yaml -f thanos-store-serviceMonitor.yaml
Deploy Thanos Compactor (Optional)
Make the following changes in the Thanos Store configuration files
Change the spec.template.spec.containers.env in thanos-compact-statefulSet.yaml (kube-thanos/examples/all/manifests/thanos-compact-statefulSet.yaml) to:
env:
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos.yaml
name: thanos-objstore-config
Make sure we change the namespace to monitoring in all the files in the below command. Apply the Store statefulSet, service, and serviceMonitor manifests
kubectl apply -f thanos-compact-statefulSet.yaml -f thanos-compact-service.yaml -f thanos-compact-serviceMonitor.yaml
Check the status of all Thanos components
kubectl get all -n monitoring
Configure Thanos as Grafana data source
Add Thanos Querier service as a data source in Grafana UI, then start viewing metric data. This can be done by going to Grafana -> Configuration -> Data Sources -> Add data source
Now we would be able to view the metrics on Grafana. PFB the screenshots for the same.
Installation
Elastic Kubernetes Service (EKS)
Configure S3 bucket and IAM user with S3 access
Create an S3 bucket named prometheus-thanos
Now create an IAM user with the following S3 permission. Copy the access and secret key for the same:
- s3:ListBucket
- s3:GetObject
- s3:DeleteObject
- s3:PutObject
Configure objectStorageConfig
Create a file thanos-storage-config.yaml with the following data
type: s3
config:
bucket: prometheus-thanos #S3 bucket name
endpoint: s3.ap-south-1.amazonaws.com #S3 Regional endpoint
access_key: <access_key>
secret_key: <secret_key>
Create a K8s secret with the above config file
kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos-storage-config.yaml
Now repeat all the steps from the GKE set up to deploy the Prometheus and Thanos components. No need to configure Grafana. Just make sure we change the below in the values_default.yaml
externalLabels:
cluster: eks-1
Installation
Managed Kubernetes Service (AKS)
Create an AKS Storage account and download its keys
Create an AKS Storage account named thanosdemo
az storage account create --name thanosdemo --resource-group <aks-rsg> --location southindia --sku Standard_LRS
Create a storage container in the above storage account
az storage container create --account-name thanosdemo --name thanos
Get the storage account keys
az storage account keys list -g <aks-rsg> -n thanosdemo
[
{
"creationTime": "2021-08-06T07:24:25.606004+00:00",
"keyName": "key1",
"permissions": "FULL",
"value": "key"
},
{
"creationTime": "2021-08-06T07:24:25.606004+00:00",
"keyName": "key2",
"permissions": "FULL",
"value": "<key>"
}
]
Configure objectStorageConfig
Create a file thanos-storage-config.yaml with the following data
type: AZURE
config:
storage_account: "thanosdemo"
storage_account_key: "<key>"
container: "thanos"
Create a K8s secret with the above config file
kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos-storage-config.yaml
Now repeat all the steps from the GKE set up to deploy the Prometheus and Thanos components. No need to configure Grafana. Just make sure we change the below in the values_default.yaml
externalLabels:
cluster: eks-1
Multi-Clustering Monitoring
Grafana in GKE to monitor all the clusters
Expose Thanos query on port 10901 of the AKS and EKS clusters to be accessible from the GKE cluster. We can patch the thanos-query service in the cluster using the following command
kubectl patch svc thanos-query -n monitoring -p '{"spec": {"type": "LoadBalancer"}}'
Now the thanos-query service will be available from the GKE cluster.
We now update the thanos-query deployment in the GKE cluster to query metrics from the thanos-query(or store endpoint) from the other two clusters. This can be done by adding the query endpoints as the arguments
- --store=<AWS_LB_DNS>:10901
- --store=<AKS_LB_IP>:10901
Now if we observe the data on the default Grafana dashboard we will be able to view metrics with label cluster for each one of the 3 clusters gke-1, eks-1, aks-1
We can also expose the Thanos-Query service on the GKE cluster and observe data being populated from all the other clusters also. PFB the scrrenshot for the same.
PFB the Grafana dashboard for all nodes from all the three clusters
Conclusion
Thanos to monitor multi-cloud Kubernetes setup
Custom configuration
Thanos is a really complex system with a lot of moving parts, we did not deep dive into the specific custom configuration involved.
Central platform
We explained how Thanos can be used as a central platform for the monitor our multi-cloud K8s infrastructure.