By browsing our website, you consent to our use of cookies and other tracking technologies. For more information, read our Privacy Policy.
- Jump to
Intro
Microservices architecture is becoming a de facto way to develop cloud-native applications. Cloud makes it easier to scale and enable elasticity by making use of small, independent services instead of a monolith application.
Microservices implementation requires communications among lots of inter-dependent services that can become complicated once they reach a certain scale. With the increase in the number of microservices in your ecosystem, it becomes very difficult to control, manage and configure the dynamically changing chatter between microservices.
At CloudCover we end up provisioning, managing such high-scale setups for our various unicorn customers. Increasing complexity to manage security, observability, traffic routing, and resilience at every layer pushed us to explore and implement the best-suited service mesh solution that will help us mitigate these issues seamlessly.
In this blog, we will discuss one such offering by Harshicorp i.e Consul Service Mesh, and how to leverage the same in a multi-cloud setup.
What is Consul Service Mesh?
Consul is a service mesh solution providing a full featured control plane with service discovery, configuration, and segmentation functionality. Each of these features can be used individually as needed, or they can be used together to build a full service mesh. Consul requires a data plane and supports both a proxy and native integration model. Consul ships with a simple built-in proxy so that everything works out of the box, but also supports 3rd party proxy integrations such as Envoy.
Components
Consul Agents :- Every node that provides services to Consul runs a Consul agent. The agent is responsible for health checking the services on the node as well as the node itself.
Consul Servers :- The Consul servers are where data is stored and replicated. The agents talk to one or more Consul servers. The servers themselves elect a leader.
Sidecar-proxy :- The proxy sidecar transparently handles inbound and outbound service connections, automatically wrapping and verifying TLS connections.
Mesh Gateway :- Mesh gateways enable routing of Connect traffic between different Consul datacenters. These gateways operate by sniffing the SNI header out of the Connect session and then route the connection to the appropriate destination based on the server name requested. The data within the mTLS session is not decrypted by the Gateway.
Multi-Cloud Architecture
Mesh gateways enable routing of Connect traffic between different Consul datacenters. Those datacenters can reside in different clouds or runtime environments where general interconnectivity between all services in all datacenters isn't feasible. These gateways operate by sniffing the SNI header out of the Connect session and then route the connection to the appropriate destination based on the server name requested. The data within the mTLS session is not decrypted by the Gateway.
Setup (Federation Between Kubernetes Clusters)
You can find more detail about the federation setup in the official link.
Requirements
- 2 kubernetes cluster
- Kubectl install
- Helm version 3
Primary Datacenter (DC1)
Create dc1.yaml for datacenter dc1 (review and update the parameters as per the requirements). A list of supported parameters can be found here
global:
name: consul
datacenter: dc1 # Name of datacenter
image: "consul:1.9.5"
metrics:
enabled: true
enableAgentMetrics: true
# TLS configures whether Consul components use TLS.
tls:
# TLS must be enabled for federation in Kubernetes.
enabled: true
httpsOnly: false
federation:
enabled: true
# This will cause a Kubernetes secret to be created that
# can be imported by secondary datacenters to configure them
# for federation.
createFederationSecret: true
connectInject:
# Consul Connect service mesh must be enabled for federation.
enabled: true
metrics:
defaultEnabled: true # by default, this inherits from the value global.metrics.enabled
defaultPrometheusScrapePort: 20200
defaultPrometheusScrapePath: "/metrics"
controller:
enabled: true
meshGateway:
# Mesh gateways are gateways between datacenters. They must be enabled
# for federation in Kubernetes since the communication between datacenters
# goes through the mesh gateways.
enabled: true
ui:
enabled: true
metrics:
enabled: true # by default, this inherits from the value global.metrics.enabled
provider: "prometheus"
baseURL: http://prometheus-server
service:
type: 'LoadBalancer'
prometheus:
enabled: true
grafana:
enabled: true
ingressGateways:
enabled: true
gateways:
- name: ingress-gateway
service:
type: LoadBalancer
syncCatalog:
enabled: true
default: false
Deploy release using helm
helm install -f dc1.yaml consul-dc1 hashicorp/consul
Create and apply ProxyDefaults resource to configure Consu to use the mesh gateways for service mesh traffic.
apiVersion: consul.hashicorp.com/v1alpha1
kind: ProxyDefaults
metadata:
name: global
spec:
meshGateway:
mode: 'local'
The spec.meshGateway.mode can be set to local or remote. If set to local, traffic from one datacenter to another will egress through the local mesh gateway. This may be useful if you prefer all your cross-cluster network traffic to egress from the same locations. If set to remote, traffic will be routed directly from the pod to the remote mesh gateway (resulting in one less hop).
The federation secret is a Kubernetes secret containing information needed for secondary datacenters/clusters to federate with the primary. This secret is created automatically by setting:
kubectl get secret consul-federation -o yaml > consul-federation-secret.yaml
Federated Datacenter (DC2)
Import the federated Secrets
kubectl apply -f consul-federation-secret.yaml
Create dc2.yaml for datacenter dc2(review and update the parameters as per the requirements)
global:
name: consul
datacenter: dc2 # Datacenter name
image: "consul:1.9.5"
metrics:
enabled: true
enableAgentMetrics: true
tls:
enabled: true
httpsOnly: false
# Here we're using the shared certificate authority from the primary
# datacenter that was exported via the federation secret.
caCert:
secretName: consul-federation
secretKey: caCert
caKey:
secretName: consul-federation
secretKey: caKey
federation:
enabled: true
connectInject:
enabled: true
metrics:
defaultEnabled: true # by default, this inherits from the value global.metrics.enabled
defaultPrometheusScrapePort: 20200
defaultPrometheusScrapePath: "/metrics"
controller:
enabled: true
meshGateway:
enabled: true
server:
# Here we're including the server config exported from the primary
# via the federation secret. This config includes the addresses of
# the primary datacenter's mesh gateways so Consul can begin federation.
extraVolumes:
- type: secret
name: consul-federation
items:
- key: serverConfigJSON
path: config.json
load: true
ui:
enabled: true
metrics:
enabled: true # by default, this inherits from the value global.metrics.enabled
provider: "prometheus"
baseURL: http://prometheus-server
service:
type: 'LoadBalancer'
ingressGateways:
enabled: true
gateways:
- name: ingress-gateway
service:
type: LoadBalancer
prometheus:
enabled: true
grafana:
enabled: true
syncCatalog:
enabled: true
default: false
Deploy release using helm
helm install -f dc2.yaml consul-dc2 hashicorp/consul
Verifying Federation
To verify that both datacenters are federated, run the consul members -wan command on one of the Consul server pods:
Traffic Management
We are now checking how traffic splitting is working with consul service mesh. For that, we are using a simple 2 tier application frontend(web pod) and backend (data pod). Frontend contains a simple Nginx pod with proxy configuration, this will redirect all traffic to backend pods. And backend pods also contain the Nginx web application. We have deployed two backends with v1 and v2. Each version displays the v1 and v2 version on the ui.
Applications are deployed across the cloud (aws and gcp). You can see the two pods (envoy-sidecar, consul-sidecar) are running besides to main application. You need to apply the annotations to inject the sidecar to your application.
annotations: 'consul.hashicorp.com/connect-inject': 'true'
We have added upstream service in k8s manifest file, so that frontend(web) can able to talk to the backend(data) service.
annotations: 'consul.hashicorp.com/connect-service-upstreams': 'data:8081:dc2' # dc2 datacenter is nothing but gcp cluster
Apply the labels as annotations on data service, later we can use these labels to configure the service resolver.
annotations: 'consul.hashicorp.com/service-tags': 'v1' # Apply the same tags with v2 for the data-v2 deployment 'consul.hashicorp.com/service-meta-version': 'v1' # Apply the same meta with v2 for the data-v2 deployment
Note: We have not deployed any k8s service for our applications.
Create and apply the service default configurations for the data service
apiVersion: consul.hashicorp.com/v1alpha1 kind: ServiceDefaults metadata: name: data spec: protocol: http
Create and apply service resolver configuration for the data service
apiVersion: consul.hashicorp.com/v1alpha1 kind: ServiceResolver metadata: name: data spec: defaultSubset: 'v1' subsets: 'v2': filter: 'Service.Meta.version == v2' 'v1': filter: 'Service.Meta.version == v1'
Create and apply service splitter configuration for the data service
apiVersion: consul.hashicorp.com/v1alpha1 kind: ServiceSplitter metadata: name: data spec: splits: - weight: 30 serviceSubset: v1 - weight: 70 serviceSubset: v2
Once all configurations are applied, we can try to send traffic to our main frontend application.
kubectl port-forward deploy/web 8080:80 --context cn-aws
You can see 70% of traffic is redirected to version v2 and 30% of traffic to version v1
Conclusion
Highlights:
- Supported multi-cluster mesh with k8s and VM based configurations (link)
- Single Dashboard(Prometheus) to visualize all services from different datacenters.
- Supported traffic management features like Traffic Splitting, Canary Release, Header based, Re-routing, Timeout, Circuit Breaking, Retry, Traffic shaping/load-balancing, Service Failover.
- Security Features available like Mtls, Acls, Intensions
- Distributed Tracing via jaeger (Needs extra setup)
- Logging via GCP stack driver
Pain points:
- No auto-discovery of services. Need to keep track all dependent services in the manifest files.
- Preconfigured dashboards are not available on consul ui to see the service metrics. Need to rely on Prometheus and Grafana.
- No pre-configured distributed tracing solution requires additional setup.
- Need to maintain unique name using container name or annotations for the services across the mesh.
- Consul runs 2 more extra container ( envoy-sidecar + consul-sidecar ) beside to main application. This will increase resource utilization.
- K8s based policy configurations are not easily available.
- Module authorization is not working with the default consul namespace if services are deployed in different k8s namespaces.
- Fault Injection, Traffic Shadowing, Header Modification features currently are not supported
- Overall configuration complexity is high
How can we help?
At CloudCover, we are always looking forward for the next challenge. Drop us a line, we would love to hear from you.