cookie consent

By browsing our website, you consent to our use of cookies and other tracking technologies. For more information, read our Privacy Policy.

Intro

Microservices architecture is becoming a de facto way to develop cloud-native applications. Cloud makes it easier to scale and enable elasticity by making use of small, independent services instead of a monolith application.

Microservices implementation requires communications among lots of inter-dependent services that can become complicated once they reach a certain scale. With the increase in the number of microservices in your ecosystem, it becomes very difficult to control, manage and configure the dynamically changing chatter between microservices.

At CloudCover we end up provisioning, managing such high-scale setups for our various unicorn customers. Increasing complexity to manage security, observability, traffic routing, and resilience at every layer pushed us to explore and implement the best-suited service mesh solution that will help us mitigate these issues seamlessly.

In this blog, we will discuss one such offering by Harshicorp i.e Consul Service Mesh, and how to leverage the same in a multi-cloud setup.

What is Consul Service Mesh?

Consul is a service mesh solution providing a full featured control plane with service discovery, configuration, and segmentation functionality. Each of these features can be used individually as needed, or they can be used together to build a full service mesh. Consul requires a data plane and supports both a proxy and native integration model. Consul ships with a simple built-in proxy so that everything works out of the box, but also supports 3rd party proxy integrations such as Envoy.

Components

  • Consul Agents :- Every node that provides services to Consul runs a Consul agent. The agent is responsible for health checking the services on the node as well as the node itself.

  • Consul Servers :- The Consul servers are where data is stored and replicated. The agents talk to one or more Consul servers. The servers themselves elect a leader.

  • Sidecar-proxy :- The proxy sidecar transparently handles inbound and outbound service connections, automatically wrapping and verifying TLS connections.

  • Mesh Gateway :- Mesh gateways enable routing of Connect traffic between different Consul datacenters. These gateways operate by sniffing the SNI header out of the Connect session and then route the connection to the appropriate destination based on the server name requested. The data within the mTLS session is not decrypted by the Gateway.

Multi-Cloud Architecture

Multi-Cloud Architecture

Mesh gateways enable routing of Connect traffic between different Consul datacenters. Those datacenters can reside in different clouds or runtime environments where general interconnectivity between all services in all datacenters isn't feasible. These gateways operate by sniffing the SNI header out of the Connect session and then route the connection to the appropriate destination based on the server name requested. The data within the mTLS session is not decrypted by the Gateway.

Setup (Federation Between Kubernetes Clusters)

You can find more detail about the federation setup in the official link.

Requirements

  • 2 kubernetes cluster
  • Kubectl install
  • Helm version 3

Primary Datacenter (DC1)

Create dc1.yaml for datacenter dc1 (review and update the parameters as per the requirements). A list of supported parameters can be found here

global:
  name: consul
  datacenter: dc1 # Name of datacenter
  image: "consul:1.9.5"
  metrics:
    enabled: true
    enableAgentMetrics: true

  # TLS configures whether Consul components use TLS.
  tls:
    # TLS must be enabled for federation in Kubernetes.
    enabled: true
    httpsOnly: false

  federation:
    enabled: true
    # This will cause a Kubernetes secret to be created that
    # can be imported by secondary datacenters to configure them
    # for federation.
    createFederationSecret: true

connectInject:
  # Consul Connect service mesh must be enabled for federation.
  enabled: true
  metrics:
    defaultEnabled: true # by default, this inherits from the value global.metrics.enabled
    defaultPrometheusScrapePort: 20200
    defaultPrometheusScrapePath: "/metrics"
controller:
  enabled: true

meshGateway:
  # Mesh gateways are gateways between datacenters. They must be enabled
  # for federation in Kubernetes since the communication between datacenters
  # goes through the mesh gateways.
  enabled: true

ui:
  enabled: true
  metrics:
    enabled: true # by default, this inherits from the value global.metrics.enabled
    provider: "prometheus"
    baseURL: http://prometheus-server
  service:
    type: 'LoadBalancer'

prometheus:
  enabled: true

grafana:
  enabled: true

ingressGateways:
  enabled: true
  gateways:
    - name: ingress-gateway
      service:
        type: LoadBalancer

syncCatalog:
  enabled: true
  default: false

Deploy release using helm

helm install -f dc1.yaml consul-dc1 hashicorp/consul

Create and apply ProxyDefaults resource to configure Consu to use the mesh gateways for service mesh traffic.

apiVersion: consul.hashicorp.com/v1alpha1
kind: ProxyDefaults
metadata:
  name: global
spec:
  meshGateway:
    mode: 'local' 

The spec.meshGateway.mode can be set to local or remote. If set to local, traffic from one datacenter to another will egress through the local mesh gateway. This may be useful if you prefer all your cross-cluster network traffic to egress from the same locations. If set to remote, traffic will be routed directly from the pod to the remote mesh gateway (resulting in one less hop).

The federation secret is a Kubernetes secret containing information needed for secondary datacenters/clusters to federate with the primary. This secret is created automatically by setting:

kubectl get secret consul-federation -o yaml > consul-federation-secret.yaml

Federated Datacenter (DC2)

Import the federated Secrets

kubectl apply -f consul-federation-secret.yaml

Create dc2.yaml for datacenter dc2(review and update the parameters as per the requirements)

global:
  name: consul
  datacenter: dc2 # Datacenter name
  image: "consul:1.9.5"
  metrics:
    enabled: true
    enableAgentMetrics: true
  tls:
    enabled: true
    httpsOnly: false
    # Here we're using the shared certificate authority from the primary
    # datacenter that was exported via the federation secret.
    caCert:
      secretName: consul-federation
      secretKey: caCert
    caKey:
      secretName: consul-federation
      secretKey: caKey

  federation:
    enabled: true

connectInject:
  enabled: true
  metrics:
    defaultEnabled: true # by default, this inherits from the value global.metrics.enabled
    defaultPrometheusScrapePort: 20200
    defaultPrometheusScrapePath: "/metrics"

controller:
  enabled: true

meshGateway:
  enabled: true
server:
  # Here we're including the server config exported from the primary
  # via the federation secret. This config includes the addresses of
  # the primary datacenter's mesh gateways so Consul can begin federation.
  extraVolumes:
    - type: secret
      name: consul-federation
      items:
        - key: serverConfigJSON
          path: config.json
      load: true

ui:
  enabled: true
  metrics:
    enabled: true # by default, this inherits from the value global.metrics.enabled
    provider: "prometheus"
    baseURL: http://prometheus-server
  service:
    type: 'LoadBalancer'

ingressGateways:
  enabled: true
  gateways:
    - name: ingress-gateway
      service:
        type: LoadBalancer

prometheus:
  enabled: true

grafana:
  enabled: true

syncCatalog:
  enabled: true
  default: false

Deploy release using helm

helm install -f dc2.yaml consul-dc2 hashicorp/consul

Verifying Federation To verify that both datacenters are federated, run the consul members -wan command on one of the Consul server pods: Verifying Federation

Traffic Management

We are now checking how traffic splitting is working with consul service mesh. For that, we are using a simple 2 tier application frontend(web pod) and backend (data pod). Frontend contains a simple Nginx pod with proxy configuration, this will redirect all traffic to backend pods. And backend pods also contain the Nginx web application. We have deployed two backends with v1 and v2. Each version displays the v1 and v2 version on the ui.

  1. Applications are deployed across the cloud (aws and gcp). You can see the two pods (envoy-sidecar, consul-sidecar) are running besides to main application. You need to apply the annotations to inject the sidecar to your application.

    annotations:
    'consul.hashicorp.com/connect-inject': 'true'
    
  2. We have added upstream service in k8s manifest file, so that frontend(web) can able to talk to the backend(data) service.

    annotations:
    'consul.hashicorp.com/connect-service-upstreams': 'data:8081:dc2'
    # dc2 datacenter is nothing but gcp cluster
    
  3. Apply the labels as annotations on data service, later we can use these labels to configure the service resolver.

    annotations:
    'consul.hashicorp.com/service-tags': 'v1' 
    # Apply the same tags with v2 for the data-v2 deployment
    'consul.hashicorp.com/service-meta-version': 'v1' 
    # Apply the same meta with v2 for the data-v2 deployment   
    

    Apply Labels Note: We have not deployed any k8s service for our applications.

  4. Create and apply the service default configurations for the data service

    apiVersion: consul.hashicorp.com/v1alpha1
    kind: ServiceDefaults
    metadata:
    name: data
    spec:
    protocol: http
    
  5. Create and apply service resolver configuration for the data service

    apiVersion: consul.hashicorp.com/v1alpha1
    kind: ServiceResolver
    metadata:
    name: data
    spec:
    defaultSubset: 'v1'
    subsets:
     'v2':
       filter: 'Service.Meta.version == v2'
     'v1':
       filter: 'Service.Meta.version == v1'
    
  6. Create and apply service splitter configuration for the data service

    apiVersion: consul.hashicorp.com/v1alpha1
    kind: ServiceSplitter
    metadata:
    name: data
    spec:
    splits:
     - weight: 30
       serviceSubset: v1
     - weight: 70
       serviceSubset: v2
    
  7. Once all configurations are applied, we can try to send traffic to our main frontend application. Configuration Set

    kubectl port-forward deploy/web 8080:80 --context  cn-aws
    

    Configuration Set

You can see 70% of traffic is redirected to version v2 and 30% of traffic to version v1

Conclusion

Highlights:

  • Supported multi-cluster mesh with k8s and VM based configurations (link)
  • Single Dashboard(Prometheus) to visualize all services from different datacenters.
  • Supported traffic management features like Traffic Splitting, Canary Release, Header based, Re-routing, Timeout, Circuit Breaking, Retry, Traffic shaping/load-balancing, Service Failover.
  • Security Features available like Mtls, Acls, Intensions
  • Distributed Tracing via jaeger (Needs extra setup)
  • Logging via GCP stack driver

Pain points:

  • No auto-discovery of services. Need to keep track all dependent services in the manifest files.
  • Preconfigured dashboards are not available on consul ui to see the service metrics. Need to rely on Prometheus and Grafana.
  • No pre-configured distributed tracing solution requires additional setup.
  • Need to maintain unique name using container name or annotations for the services across the mesh.
  • Consul runs 2 more extra container ( envoy-sidecar + consul-sidecar ) beside to main application. This will increase resource utilization.
  • K8s based policy configurations are not easily available.
  • Module authorization is not working with the default consul namespace if services are deployed in different k8s namespaces.
  • Fault Injection, Traffic Shadowing, Header Modification features currently are not supported
  • Overall configuration complexity is high