By browsing our website, you consent to our use of cookies and other tracking technologies. For more information, read our Privacy Policy.
- Jump to
Intro
CloudCover helps customers in choosing the best-suited service mesh for their use case. The first step towards evaluation is feature comparison and to check if all the critical features are supported in the given service mesh and the second aspect is whether a given mesh is performant and optimized when it comes to real-time usage. In order to get a brief idea about the functionality of the top three service meshes i.e Istio, LinkerD, and Consul Connect.
Now that you have understood various functional aspects and adoption trends of the service mesh from the report above. It becomes very clear that istio is leading from the front and the preferred choice for production adoption. But it might now be as straight and as simple to decide and to pick istio for implementing it for a given customer use case.
This blog is focused on covering the second aspect of service mesh performance evaluation.
Performance Evaluation: Consul vs. Linkerd vs. Istio
We have considered a benchmarking tool: Github link to start with and we are updating the same as per our requirement and maintaining it in our cldcvr repo. PFB the metrics which are being used to evaluate the SM’s:
- CPU and Memory Utilization of Control Plane
- CPU and Memory Utilization of Data Plane
- Latencies
- Network usage
Setup
Assumptions: Complete POC is done on the GKE cluster.
The workload and benchmark applications are deployed on a GKE cluster consisting of two node pools. One is assigned a label of role: workload and the other with role: benchmark. The application pods will be running on the nodes with workload labels and benchmarking pods will be running on the ones with benchmark labels.
Push-Gateway Installation
- The Prometheus Pushgateway allows ephemeral and batch jobs to expose their metrics to Prometheus. Since these kinds of jobs may not exist long enough to be scraped, they can instead push their metrics to a Pushgateway. The Pushgateway then exposes these metrics to Prometheus.
- The benchmark load generator will push intermediate run-time metrics as well as final latency metrics to a Prometheus push gateway.
- For push gateway installation we need Service Monitor resource which is not available by default. It is a custom resource that is part of the kube-prometheus. A detailed explanation of kube-prometheus is available here. PFB the commands required for the setup
git clone git@github.com:prometheus-operator/kube-prometheus.git
kubectl create -f manifests/setup
kubectl apply -f manifests/.
Deploy Prometheus push gateway
cd service-mesh-benchmark
helm install pushgateway --namespace monitoring configs/pushgateway
Grafana Dashboard Creation
After the Grafana pod is up and running in the monitoring namespace access the UI by forwarding the Grafana service port from the cluster:
kubectl -n monitoring port-forward svc/grafana 3000:3000
Log in to Grafana and create an API key we’ll use to upload the dashboard.
Clone the benchmarking tool from the link here and move to the service-mesh-benchmark/dashboards. Run the commands mentioned below to create the required dashboards:
./upload_dashboard.sh "[API KEY]" grafana-wrk2-cockpit.json localhost:3000
./upload_dashboard.sh "[API KEY]" grafana-wrk2-summary.json localhost:3000
Note: [API KEY] must be created via the Grafana dashboard.
We are creating two dashboards:
wrk2 cockpit: Grafana dashboard for live metrics of the load test.
wrk2 summary: Dashboard to fetch the comparison report after the benchmarking is completed for all the service meshes. Currently, this only supports reports for 3 service meshes in the picture.
Install ServiceMeshes
An automation script is in place which will install all the SM. Clone the benchmarking tool from the link here and move to service-mesh-benchmark/scripts
./setup-servicemeshes.sh
Deploy the application
Note: Namespace creation is integrated into the application deployed part via helm
We would be deploying the emojivoto application with all 3 service meshes. For this POC, we will be deploying 3 instances of the emojivoto application with one service mesh. All the applications will be deployed in separate k8s namespaces.
Deploy emojivoto with consul
Switch to the benchmark repo and execute the following commands to deploy the application.
for i in {0..2} ; do helm install emojivoto-consul-$i --set servicemesh=consul configs/emojivoto done
Validate the application container and proxy container running in each pod
Deploy emojivoto with linkerd
Helm will manage the creation of the k8s namespace with linkerd.io/inject: enabled annotation
Switch to the benchmark repo and execute the following commands to deploy the application.
for i in {0..2} ; do helm install emojivoto-linkerd-$i --set servicemesh=linkerd configs/emojivoto done
- Validate the application container and proxy container running in each pod. Also, we can refer to the linkerd dashboard for the same.
Deploy emojivoto with istio
Helm will manage the creation of the k8s namespace with an istio-injection=enabled label.
Switch to the benchmark repo and execute the following commands to deploy the application
for i in {0..2} ; do helm install emojivoto-istio-$i --set servicemesh=istio configs/emojivoto done
- Validate the application container and proxy container running in each pod.
Start Benchmark(Loadtest)
The benchmarking tool is deployed using code available here service-mesh-benchmark/configs/benchmark
We will be launching separate instances of the benchmark tool for load testing applications per service mesh.
Before initiating the benchmark application make the changes required in service-mesh-benchmark/configs/benchmark/values.yaml file.
wrk2: duration: 1800 #time in seconds connections: 96 RPS: 500 #set the required RPS for the load test initDelay: 0 serviceMesh: "linkerd" #which service mesh are we testing app: name: emojivoto-linkerd #values can be emojivoto-linkerd, emojivoto-istio or emojivoto-consul count: "3" #count of application instance deployed per service mesh appImage: quay.io/kinvolk/wrk2-prometheus
Set the values servicemesh: consul and name: emojivoto-consul and start the benchmark tool to load test application with consul service mesh using the following command:
helm install --create-namespace benchmark-consul --namespace benchmark-consul configs/benchmark
Set the values servicemesh: linkerd and name: emojivoto-linkerd and start the benchmark tool to load test application with consul service mesh using the following command:
helm install --create-namespace benchmark-linkerd --namespace benchmark-linkerd configs/benchmark
Set the value of servicemesh: istio and name: emojivoto-istio and start the benchmark tool to load test application with consul service mesh using the following command:
helm install --create-namespace benchmark-istio --namespace benchmark-istio configs/benchmark
Check the pod status in all the benchmark namespaces
We can monitor the live metrics on the wrk2 cockpit dashboard by selecting the correct job name and the timing in the variable dropdown.
Compare and Conclude
When all the 3 benchmarking deployments are done we run the metrics-merger job to update summary metrics on the wrk2 summary dashboard.
The metric-merger code available here service-mesh-benchmark/configs/metrics-merger. Run the following command:
helm install --create-namespace --namespace metrics-merger metrics-merger configs/metrics-merger
- After the above job is completed we can check the wrk2 summary dashboard for the final result.
Findings
We did a benchmark on all service mesh for 30 min at 500RPS benchmark against 3 emojivoto app instances, with 96 threads / simultaneous connections. PFB the monitoring metrics captured for all.
Consul metrics
Component | CPU Seconds Usage | Memory Usage(MB) |
---|---|---|
consul-connect-lifecycle-sidecar | 0.00931 | 22.81 |
consul-connect-envoy-sidecar | 0.00585 | 16.05 |
Linkerd metrics
Component | CPU Seconds Usage | Memory Usage(MB) |
---|---|---|
Linkerd -Proxy(Sidecar) | 0.150 | 11.26 |
Istio Metrics
Component | CPU Seconds Usage | Memory Usage(MB) |
---|---|---|
Istio-Proxy(Sidecar) | 0.2546 | 64.3 |
Summary Report
Memory usage and CPU utilization
Values from the graph and added in the table below
Component | CPU Seconds Usage | Memory Usage(MB) |
---|---|---|
Consul | 0.0463 | 178.5 |
Istio | 0.0053 | 90.5 |
Linkerd | 0.0023 | 55.7 |
While the above charts imply that Consul control plane components have utilized the most resources followed by Istio and then Linkerd. Also, it can be observed in the sidecar utilization metrics of the respective service mesh that Istio’s sidecar utilized the highest resources followed by Linkerd and then Consul. So it is really important to understand the resource utilization before using any SerivceMesh in the setup.
Latency Percentiles
With this load, Linkerd and Istio easily generated latencies in the minutes' range. No socket / HTTP errors were observed during the load test. Also, the effective throughput was around 500 RPS. Observing the above metrics Consul is the winner by a long margin with a 1.0 percentile latency of 1.06 s whereas Linkerd and Istio having a latency of 3.08 min and 5.93 min respectively.
Conclusion
Consul out-performed Linkerd and Istio when it comes to latency, with the acceptable overhead of resource consumption by the control plane. But with consul, we need to consider the ease of implementation wrt complex setups. Linkerd takes the edge on resource consumption, even the application CPU utilization is the lowest as compared to others. This proves that Linkerd is lightweight when it comes to resource utilization. The final call of using the appropriate service mesh will depend on the requirement where all the above pointers shall be considered.
How can we help?
At CloudCover, we are always looking forward for the next challenge. Drop us a line, we would love to hear from you.