Introduction

Building and scaling a payments infrastructure requires reliability, efficiency, and automation. What began in 2015 as a monolithic PHP application, we, at Cashfree Payments, have evolved into a high-performance microservices architecture. Today we process transactions worth $80B annually for more than 800,000 businesses. This rapid growth was made possible by adopting a microservices architecture and implementing GitOps with tools like Helm, ArgoCD, Bitbucket, and Terraform. 

These technologies have allowed our DevOps team to automate and standardise infrastructure management, application configurations, deployments, monitoring, and recovery. In this blog, I will outline the design choices that shaped our approach and the key factors that led us to our current system.

Challenges

With rapid business growth, our microservices quickly expanded to nearly 200. We soon realised that without proper management, they could become a tangled mess—what I call a “noodle soup.” Also, scalability and uptime cannot be achieved without proper automation in place. Here are some key challenges that we faced with microservices architecture:

  • Data Management and Consistency
    • Safeguarding Production and Disaster Recovery Data: Protect critical data to prevent loss or corruption in production and disaster recovery setups.
    • Seamless Data Replication: Simplify the replication of non-production data for setting up lower environments like QA, staging, and performance testing.
  • Environment Consistency and Deployment
    • Uniform Setup Across Environments: Maintain consistent configurations across all environments to identify and resolve bugs in lower environments effectively.
    • Mitigating Misconfigurations: Prevent configuration errors, such as accidentally making an S3 bucket public, which can lead to serious security breaches

Design Requirements

To manage the above-mentioned challenges, we quickly converged to GitOps. GitOps is a set of practices for managing infrastructure and application configurations using Git as the single source of truth. It leverages Git repositories to store the declarative state of systems and applications, enabling automation and streamlined deployment, monitoring, and recovery processes. In addition to these challenges, our design requirements included: 

  • Robust design to manage ~200 microservices
  • Infrastructure and Application consistency across environments
    • All lower environments (qa, stage, sandbox, etc.) should mirror production closely.
  • Manage complex permissions and cross-permissions of microservices
    • Services need read/write access to s3, SQS, SNS, Kafka topics, databases, etc… and sometimes cross permissions of the same to other services.
  • Flexible deployment options
    • Individual services (Deploy one service at a time.)
    • Subset of services (Product, i.e. deploy a closely related subset of services) 
    • Entire ecosystem (Deploy all services at once) 
  • Set up new environments in a single click
  • Empowering developers and reducing DevOps dependency

Implementation

The GitOps implementation at Cashfree Payments is made using the following tools:

  1. Helm
  2. ArgoCD
  3. Bitbucket
  4. Terraform

Helm

Each service has its helm chart defined with all the necessary Kubernetes components associated with it. These charts are managed using the helm repository.

ArgoCD

Argo CD is a declarative GitOps continuous delivery tool for Kubernetes. It automates the deployment and management of applications in Kubernetes clusters by synchronising the desired state defined in Git repositories with the actual state in the clusters. ArgoCD is designed to improve the efficiency, consistency, and reliability of Kubernetes application deployments.

Bitbucket

We use a bitbucket git repository and Bitbucket Pipelines for CI.

Terraform

Terraform is used to manage all AWS services as well as Kubernetes services. We employ Terraform modules to define each service, with modules typically including components such as:

  1. S3
  2. SQS
  3. SNS
  4. IAM role and IAM policy
  5. Argo application

A product terraform module comprises multiple service modules, ensuring consistency across all environments by initializing these product modules in their respective environment folders. The terraform folder structure is illustrated in the below diagram.

Deploying Microservices

Most deployments consist of source code change, ConfigMap change or both.

During the build process, source code is built as a Docker image and pushed to ECR with appropriate version tags (for ease of access, the version tag is a combination of service version number + PullRequestNumber + PipelineNumber ). Configmap is built/packaged as a helm repo with the same version tag

Each ArgoCD Application contains two helm charts:

  • Configmap helm chart hosted as helm repo and added as dependency chart
    • Configmaps
  • Primary Helm Chart hosted as Git repo(i.e. everything except configmaps)
    • CronJob
    • DatadogMonitors
    • Deployment
    • HPA
    • Ingress
    • Jobs
    • KafkaTopic
    • KafkaUser
    • Namespace
    • PDB(Pod Distribution Budget)
    • Rollouts
    • Service
    • ServiceAccount
    • Vmrule

At the time of deployment, the Argocd application is updated with the version tag for both the dependent ConfigMap Helm chart and Docker image tag in a single git commit, making sure the ConfigMap changes and source code changes are deployed in tandem. This also helps in making sure both source code and configmaps are reverted simultaneously.

To maintain consistency in the configmap across all environments, we use a single configmap template file and the configmap values are generated at runtime based on different environments just before the build/helm packaging steps. This ensures configmaps are also consistent across all environments.

Bringing it together

  • Terraform is used to provision:
    • Infrastructure (VPC, EKS etc.)
    • Deploy Applications through ArgoCD
    • And application-dependent AWS Resources
  • Build is triggered when code or a configmap is changed
  • Configmap is generated at build time from a single template file (ESL), making sure the configmap is consistent across all environments.
    • ESL (Environment Specification Language) is a language-cum-library to efficiently manage the configurations for a Kubernetes service across multiple environments. This library helps unify the multiple environment configurations of a service into a single file.
  • The CI pipeline generates a version tag for the image and config map and updates the helm chart.
     
  • ArgoCD then deploys the source code and configmap change in tandem.

Conclusion

With GitOps, ArgoCD, and Configmap-Templatization, onboarding new services and environments is now seamless. We are closer to achieving one-click creation of non-ephemeral environments. 

The setup time for a new environment has been reduced from months to just 3-4 days, requiring only one DevOps engineer. Last year, we managed two environments with eight DevOps engineers; today, we handle over 11 environments with the same team size. Our deployment frequency has increased, and the mean time to take a feature to production has dropped by 66%.

Discover more from Cashfree Payments Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading