Case Study

Atlan embraces multi-tenancy to optimize cost & time significantly

CloudCover + AWS + Atlan

tl;dr

  • CloudCover introduces multi-tenancy to enable sharing of clusters among different clients.
  • Automated onboarding and offboarding process with just a single click.
  • Mitigated the dependency of a paid tool – Kots Replicated that was very expensive.
  • Achieved efficiency and 60% time optimization.

The Overview

Atlan is an active metadata platform for modern data teams. Through its collaborative workspace and metadata activation across the data stack, Atlan helps data teams discover, understand, trust, and collaborate on their data.

The challenge

Onboarding new clients and managing new clusters was becoming tedious, and expensive.

Atlan needed an overhaul in order to ensure that its technical team and infrastructure is well equipped to handle the exponential growth and minimize costs to increase profitability.

  • Increasing Client On Board Time

    Increasing Client On Board Time

    The Atlan team has a SaaS product, which they deploy in the EKS cluster at client sites. Onboarding a new client requires them to create a separate EKS cluster and deploy the product there.

  • Overhead of managing AWS EKS Clusters

    Overhead of managing AWS EKS Clusters

    Every customer had a separate AWS EKS cluster implemented in the Single Tenant Deployment Approach. With ever-increasing clientele, it became tedious and cumbersome to manage the clusters.

  • Slow Update Release Cycle

    Slow Update Release Cycle

    Since all deployments were done manually, a vast pool of resources was already engaged in creating new clusters. Thus, leaving very little time to release updates for existing deployments.

Our Approach

CloudCover’s Strategy and Approach

Our Approach

  • Identify Bottleneck

    CloudCover consulted with Atlan regarding their existing setup including Product, Services, and Infrastructure. The team soon realized that Multi-Tenant Architecture is the way to go as Single-Tenant Architecture for client deployment was acting as the bottleneck.

  • Create New Helm Charts

    CloudCover did multiple POC around with different components for various use cases. They used ArgoWorkflow and Terraform for infrastructure provisioning and combined 40 helm charts to create new product helm charts.

  • They then created EKS pools and deployed tenants in random EKS based on the utilization and availability of clusters. Loft was used to manage EKS while GitOps ArgoCD enabled release management.

Network Architecture

Public Flow

In public flow, endpoints are publicly exposed; here we have a load balancer which is internet-facing and configured with host-based routing rules. This lb routes the traffic to the Palo Alto firewall on a specific port number and hostname. Based on this, the mapping firewall routes the traffic to the Network Load Balancer of the respective tenant via Transit Gateway.

CloudCover + AWS + Atlan

Private Flow

Private flow is for access within the VPN. The client communicates with the Application Load Balancer which is private and part of the Palo Alto Firewall. Then this traffic is routed to the AWS Transit Gateway from the Palo Alto firewall where it comes to the Network Load Balancer within the VPC.

CloudCover + AWS + Atlan

Major Components

Platform Components Architecture

A major component of the platform is ArgoCD, ArgoWorkflow & Loft. ArgoCD is used as Continuous Delivery which syncs all the changes in the Git repo to the Kubernetes clusters.

End users can trigger onboarding and off-boarding workflows using ArgoWorkflow which will provision or destroy the resources in the backend. Apart from this, there are other components including External DNS, Nginx Ingress, Velero, and Observability that take care of DNS Mapping, Routing, Backup, and monitoring respectively.

CloudCover + AWS + Atlan

Each Tenant/Vcluster

Each tenant is deployed as a separate Kubernetes cluster to host the Atlan product which includes approximately 40 microservices like Atlas, Heka, Heracles, Keycloak, Ranger, Argo, etc.

CloudCover + AWS + Atlan

Benefits

CloudCover team helped Atlan optimize client onboard time and reduce overhead.

  • Efficient Resource Utilization

    Efficient Resource Utilization

    Atlan now can leverage multi-tenancy in Kubernetes by which they can deploy 100s of products for clients in a single EKS cluster. Hence, achieving better utilization of resources.

  • Faster Release Update Cycle

    Faster Release Update Cycle

    The customer now can release updates for existing clusters at a much faster rate and can focus on their application to add new improvements for the growing user base.

  • Faster Deployment by 60%

    Faster Deployment by 60%

    Efficient monitoring using Prometheus stack with Pagerduty and the CD pipelines using ArgoCD allowed for faster and more efficient deployment on the application.

    Implementing (IaaC) Infrastructure as a Code helped Atlan set up the environment quickly on the AWS platform; thus, saving time by over 60 percent.

Tools Used

  • Amazon EKS Amazon EKS
  • AWS Secret manager AWS Secret manager
  • Amazon S3 Amazon S3
  • AWS IAM AWS IAM
  • Amazon ECR Amazon ECR
  • ArgoCD ArgoCD
  • Argo Workflows Argo Workflows
  • ELK STACK ELK STACK
  • VELERO VELERO
  • External Secret External Secret
  • EXTRENAL DNS EXTRENAL DNS
  • Kube Prometheus Stack Kube Prometheus Stack
  • LOFT LOFT