Microservices architecture is becoming a de facto way to develop cloud-native applications. Cloud makes it easier to scale and enable elasticity by making use of small, independent services instead of a monolith application.
Microservices implementation requires communications among lots of inter-dependent services that can become complicated once they reach a certain scale. With the increase in the number of microservices in your ecosystem, it becomes very difficult to control, manage and configure the dynamically changing chatter between microservices.
At CloudCover we end up provisioning, managing such high-scale setups for our various unicorn customers. Increasing complexity to manage security, observability, traffic routing, and resilience at every layer pushed us to explore and implement the best-suited service mesh solution that will help us mitigate these issues seamlessly.
In this blog, we will discuss one such offering by Harshicorp i.e Consul Service Mesh, and how to leverage the same in a multi-cloud setup.
What is Consul Service Mesh?
Consul is a service mesh solution providing a full featured control plane with service discovery, configuration, and segmentation functionality. Each of these features can be used individually as needed, or they can be used together to build a full service mesh. Consul requires a data plane and supports both a proxy and native integration model. Consul ships with a simple built-in proxy so that everything works out of the box, but also supports 3rd party proxy integrations such as Envoy.
Consul Agents :- Every node that provides services to Consul runs a Consul agent. The agent is responsible for health checking the services on the node as well as the node itself.
Consul Servers :- The Consul servers are where data is stored and replicated. The agents talk to one or more Consul servers. The servers themselves elect a leader.
Sidecar-proxy :- The proxy sidecar transparently handles inbound and outbound service connections, automatically wrapping and verifying TLS connections.
Mesh Gateway :- Mesh gateways enable routing of Connect traffic between different Consul datacenters. These gateways operate by sniffing the SNI header out of the Connect session and then route the connection to the appropriate destination based on the server name requested. The data within the mTLS session is not decrypted by the Gateway.
Mesh gateways enable routing of Connect traffic between different Consul datacenters. Those datacenters can reside in different clouds or runtime environments where general interconnectivity between all services in all datacenters isn't feasible. These gateways operate by sniffing the SNI header out of the Connect session and then route the connection to the appropriate destination based on the server name requested. The data within the mTLS session is not decrypted by the Gateway.
Setup (Federation Between Kubernetes Clusters)
You can find more detail about the federation setup in the official link.
- 2 kubernetes cluster
- Kubectl install
- Helm version 3
Primary Datacenter (DC1)
Create dc1.yaml for datacenter dc1 (review and update the parameters as per the requirements). A list of supported parameters can be found here
global: name: consul datacenter: dc1 # Name of datacenter image: "consul:1.9.5" metrics: enabled: true enableAgentMetrics: true # TLS configures whether Consul components use TLS. tls: # TLS must be enabled for federation in Kubernetes. enabled: true httpsOnly: false federation: enabled: true # This will cause a Kubernetes secret to be created that # can be imported by secondary datacenters to configure them # for federation. createFederationSecret: true connectInject: # Consul Connect service mesh must be enabled for federation. enabled: true metrics: defaultEnabled: true # by default, this inherits from the value global.metrics.enabled defaultPrometheusScrapePort: 20200 defaultPrometheusScrapePath: "/metrics" controller: enabled: true meshGateway: # Mesh gateways are gateways between datacenters. They must be enabled # for federation in Kubernetes since the communication between datacenters # goes through the mesh gateways. enabled: true ui: enabled: true metrics: enabled: true # by default, this inherits from the value global.metrics.enabled provider: "prometheus" baseURL: http://prometheus-server service: type: 'LoadBalancer' prometheus: enabled: true grafana: enabled: true ingressGateways: enabled: true gateways: - name: ingress-gateway service: type: LoadBalancer syncCatalog: enabled: true default: false
Deploy release using helm
helm install -f dc1.yaml consul-dc1 hashicorp/consul
Create and apply ProxyDefaults resource to configure Consu to use the mesh gateways for service mesh traffic.
apiVersion: consul.hashicorp.com/v1alpha1 kind: ProxyDefaults metadata: name: global spec: meshGateway: mode: 'local'
The spec.meshGateway.mode can be set to local or remote. If set to local, traffic from one datacenter to another will egress through the local mesh gateway. This may be useful if you prefer all your cross-cluster network traffic to egress from the same locations. If set to remote, traffic will be routed directly from the pod to the remote mesh gateway (resulting in one less hop).
The federation secret is a Kubernetes secret containing information needed for secondary datacenters/clusters to federate with the primary. This secret is created automatically by setting:
kubectl get secret consul-federation -o yaml > consul-federation-secret.yaml
Federated Datacenter (DC2)
Import the federated Secrets
kubectl apply -f consul-federation-secret.yaml
Create dc2.yaml for datacenter dc2(review and update the parameters as per the requirements)
global: name: consul datacenter: dc2 # Datacenter name image: "consul:1.9.5" metrics: enabled: true enableAgentMetrics: true tls: enabled: true httpsOnly: false # Here we're using the shared certificate authority from the primary # datacenter that was exported via the federation secret. caCert: secretName: consul-federation secretKey: caCert caKey: secretName: consul-federation secretKey: caKey federation: enabled: true connectInject: enabled: true metrics: defaultEnabled: true # by default, this inherits from the value global.metrics.enabled defaultPrometheusScrapePort: 20200 defaultPrometheusScrapePath: "/metrics" controller: enabled: true meshGateway: enabled: true server: # Here we're including the server config exported from the primary # via the federation secret. This config includes the addresses of # the primary datacenter's mesh gateways so Consul can begin federation. extraVolumes: - type: secret name: consul-federation items: - key: serverConfigJSON path: config.json load: true ui: enabled: true metrics: enabled: true # by default, this inherits from the value global.metrics.enabled provider: "prometheus" baseURL: http://prometheus-server service: type: 'LoadBalancer' ingressGateways: enabled: true gateways: - name: ingress-gateway service: type: LoadBalancer prometheus: enabled: true grafana: enabled: true syncCatalog: enabled: true default: false
Deploy release using helm
helm install -f dc2.yaml consul-dc2 hashicorp/consul
We are now checking how traffic splitting is working with consul service mesh. For that, we are using a simple 2 tier application frontend(web pod) and backend (data pod). Frontend contains a simple Nginx pod with proxy configuration, this will redirect all traffic to backend pods. And backend pods also contain the Nginx web application. We have deployed two backends with v1 and v2. Each version displays the v1 and v2 version on the ui.
Applications are deployed across the cloud (aws and gcp). You can see the two pods (envoy-sidecar, consul-sidecar) are running besides to main application. You need to apply the annotations to inject the sidecar to your application.
annotations: 'consul.hashicorp.com/connect-inject': 'true'
We have added upstream service in k8s manifest file, so that frontend(web) can able to talk to the backend(data) service.
annotations: 'consul.hashicorp.com/connect-service-upstreams': 'data:8081:dc2' # dc2 datacenter is nothing but gcp cluster
Apply the labels as annotations on data service, later we can use these labels to configure the service resolver.
annotations: 'consul.hashicorp.com/service-tags': 'v1' # Apply the same tags with v2 for the data-v2 deployment 'consul.hashicorp.com/service-meta-version': 'v1' # Apply the same meta with v2 for the data-v2 deployment
Create and apply the service default configurations for the data service
apiVersion: consul.hashicorp.com/v1alpha1 kind: ServiceDefaults metadata: name: data spec: protocol: http
Create and apply service resolver configuration for the data service
apiVersion: consul.hashicorp.com/v1alpha1 kind: ServiceResolver metadata: name: data spec: defaultSubset: 'v1' subsets: 'v2': filter: 'Service.Meta.version == v2' 'v1': filter: 'Service.Meta.version == v1'
Create and apply service splitter configuration for the data service
apiVersion: consul.hashicorp.com/v1alpha1 kind: ServiceSplitter metadata: name: data spec: splits: - weight: 30 serviceSubset: v1 - weight: 70 serviceSubset: v2
kubectl port-forward deploy/web 8080:80 --context cn-aws
You can see 70% of traffic is redirected to version v2 and 30% of traffic to version v1
- Supported multi-cluster mesh with k8s and VM based configurations (link)
- Single Dashboard(Prometheus) to visualize all services from different datacenters.
- Supported traffic management features like Traffic Splitting, Canary Release, Header based, Re-routing, Timeout, Circuit Breaking, Retry, Traffic shaping/load-balancing, Service Failover.
- Security Features available like Mtls, Acls, Intensions
- Distributed Tracing via jaeger (Needs extra setup)
- Logging via GCP stack driver
- No auto-discovery of services. Need to keep track all dependent services in the manifest files.
- Preconfigured dashboards are not available on consul ui to see the service metrics. Need to rely on Prometheus and Grafana.
- No pre-configured distributed tracing solution requires additional setup.
- Need to maintain unique name using container name or annotations for the services across the mesh.
- Consul runs 2 more extra container ( envoy-sidecar + consul-sidecar ) beside to main application. This will increase resource utilization.
- K8s based policy configurations are not easily available.
- Module authorization is not working with the default consul namespace if services are deployed in different k8s namespaces.
- Fault Injection, Traffic Shadowing, Header Modification features currently are not supported
- Overall configuration complexity is high
How can we help?
At CloudCover, we are always looking forward for the next challenge. Drop us a line, we would love to hear from you.
Thanks for writing us! We'll be in touch real quick.Back to website