Blog

Argo Rollouts: A stress-free solution for high-volume environments

CloudCover + Argo CD

by Abhishek Anand & Sagar Patil

tl;dr

  • This blog outlines the approach to leverage Argo Rollouts for various rollout strategies with real-time analysis.
  • The biggest challenges in working with cloud-native applications is how efficient and easy it is to do the deployments.
  • Argo Rollouts, an open source Kubernetes controller for GitOps-based progressive delivery.

Introduction

Native Kubernetes Deployment Object leverages the RollingUpdate strategy that provides safety guarantees during cluster updates. However, the Rolling Update strategy has shortcomings that leave room for a better option, especially in high-volume production environments with scale – as it may result in aggressive rollouts with no auto-rollback upon failure.

Limitations of RollingUpdate

  • Unsuitable for one-time or stress checks

    Unsuitable for one-time or stress checks

  • Can’t query external metrics to verify updates post rollouts

    Can’t query external metrics to verify updates post rollouts

  • Reduced control over rollout speed

    Reduced control over rollout speed

  • Lack of control over traffic flow to the new version

    Lack of control over traffic flow to the new version

  • Inability to auto abort and rollback updates on errors

    Inability to auto abort and rollback updates on errors

As a more stable alternative, the DevOps teams use Argo Rollouts as part of their continuous deployment toolkit to ensure more profound control and verified updates during rollouts to these critical production environments.

Argo Rollouts by definition

Argo Rollouts is a Kubernetes controller and set of Custom Resource Definitions (CRDs) which provide advanced deployment capabilities such as blue-green, canary, canary analysis, experimentation, and progressive delivery features to Kubernetes.

We can also integrate it with ingress controllers and service meshes, leveraging their traffic shaping abilities to gradually shift traffic to the new version during an update.

Argo Rollouts can use a defined AnalysisTemplate as a base and can take automated judgments on Canary rollouts. It can do canary analysis, such as on which metrics analysis should be performed, its frequency, and the values which are considered for successful or failed rollouts.

Application Rollout

Application view in browser for our initial deployment Application view in browser for our initial deployment

Argo Rollouts dashboard view in browser for our initial deployment Argo Rollouts dashboard view in browser for our initial deployment

Metrics Analysis

If you observe the istio/analysis.yaml file, you can see that it is calculating success conditions based on the below Prometheus metrics:

provider:
  prometheus:
    address: http://prometheus.istio-system:9090
    query: >+
      sum(irate(istio_requests_total{
        reporter=“source”,
        destination_service=~”..svc.cluster.local”,
        response_code!~”5.*”}[40s])
      )
      /
      sum(irate(istio_requests_total{
        reporter=“source”,
        destination_service=~”..svc.cluster.local”}[40s])
      )

For these metrics values to be seen in Prometheus, you can run an Nginx pod internally that can hit istio-rollout-canary.rollouts-demo-istio/color in a loop and generate dummy traffic.

kubectl run nginx —image=nginx -n rollouts-demo-istio
kubectl exec -it nginx -n rollouts-demo-istio — sh
> while true; do curl istio-rollout-canary.rollouts-demo-istio/color ; echo; done

Initially, a sample application rollouts demo is deployed with a blue colour indicator. Let’s change it to red.

kubectl argo rollouts set image istio-rollout “*=argoproj/rollouts-demo:red” -n rollouts-demo-istio

Rollout Progress

You can monitor the progress of the rollout from the Argo Rollout dashboard UI

Argo Rollouts dashboard view in browser for our new deployment in progress Argo Rollouts dashboard view in browser for our new deployment in progress

Run the following command to watch the rollouts.

kubectl argo rollouts get rollout istio-rollout -n rollouts-demo-istio -w

Rollout results for deployment with analysis run Rollout results for deployment with analysis run

Analysis run status Analysis run status

Finally, you can see the sample app is now pointing to red as all the analysis metrics were ‘ok’. Thus, the success condition is met in the analysis run.

Application view in browser after the new deployment Application view in browser after the new deployment

Below are the istio level prometheus metrics (istio_requests_total) used for performing analysis and performing rollouts depending on the success metrics defined in the rollout strategy.

Prometheus graph view in browser for the analysis condition check for our new deployment Prometheus graph view in browser for the analysis condition check for our new deployment

Application Rollback

Let’s try a rollback scenario. From the official README, we already know that bad-yellow are high error rate images so let’s deploy it once.

kubectl argo rollouts set image istio-rollout “*=argoproj/rollouts-demo:bad-yellow” -n rollouts-demo-istio

If you watch the Argo rollouts you can see the analysis will start to fail as this high error rate image is not satisfying our success criteria.

kubectl argo rollouts get rollout istio-rollout -n rollouts-demo-istio -w

Rollout results for failed deployment due to analysis conditions are not met Rollout results for failed deployment due to analysis conditions are not met

Status of Failed Rollout:

Analysis run failed status Analysis run failed status

For more details around failure, perform the described operation on failed rollout object.

Analysis run failed status detailed Analysis run failed status detailed

More details can be extracted from the Istio Service level dashboard which shows a massive drop in the success rate, which in turn resulted in the failed rollout.

Istio service dashboard for our canary service which is pointing to new deployment Istio service dashboard for our canary service which is pointing to new deployment

We have made use of the same istio level prometheus metrics (istio_requests_total) that we used during the rollout of RED deployment.

Prometheus graph view in browser for the failed analysis condition check for our new deployment Prometheus graph view in browser for the failed analysis condition check for our new deployment

Due to the high failure rate and unsuccessful analysis, ArgoCD will gradually do an auto-rollback to the last stable deployment, which was red in our case.

Application view in browser for our new canary deployment progress Application view in browser for our new canary deployment progress

Details can be obtained from the Argo Rollout Dashboard:

Argo Rollouts dashboard view in browser for our deployment rollback Argo Rollouts dashboard view in browser for our deployment rollback

Rollback version of the current live application (last known good state):

Application view in browser for deployment rollback Application view in browser for deployment rollback

Takeaways

  • Argo Rollouts

    Argo Rollouts

    Argo rollouts can integrate with ingress controllers and service meshes. Various sets of prometheus metrics can be used for analysis like istio_requests_total (as used in the example above) and nginx_ingress_controller_requests, based on application ingress and requirements.

  • AnalysisTemplate

    Analysis Template

    The number of steps (in Rollout object) while doing rollouts with/without manual intervention can be controlled with the optional use of AnalysisTemplate. Conditions defined in AnalysisTemplate for metrics can be altered based on business SLA

  • Canary Analysis

    Canary Analysis

    Canary analysis deployment rollouts with progressive delivery can be executed in addition to blue-green, normal canary and experimentation.