Kubernetes Cloud Landscapes at Scale with GitOps and Argo CD

Today, software is often shipped as Software as a Service (SaaS) instead of local applications or on-premise hosting. This raises a whole new set of requirements and challenges that come with providing the service 24/7. In order to solve these challenges, we typically use hyperscalers like AWS to build and host cloud-native applications. The complexity of these cloud landscapes is increasing rapidly, and it is becoming more and more difficult to maintain and operate them.

Infrastructure as Code (IaC) is a great approach to manage cloud landscapes in a declarative, reliable and transparent way. If we think one more step ahead, we don't only want to declare our Infrastructure, we also want to automatically deploy and configure it. Especially for Kubernetes clusters, GitOps is a very powerful approach that combines the power of IaC with the power of Git and Kubernetes operators. In this post, we will take a look at how we can use GitOps to manage cloud landscapes at scale and go beyond simple examples.

What the heck is GitOps?#

Although this concept is getting more and more popular, it's not yet widely known unfortunately. GitOps is based on IaC, but takes it one step further. We don't just have a declarative description of our infrastructure, we also rely on a version control system like Git. Fairly, besides from GitOps you probably also want to have your IaC configuration in Git, but the purpose there is mainly for versioning and collaboration.

With GitOps, we also want to use Git as a source of truth for the IaC configuration. Furthermore, that single source of truth is then used to automatically deploy the configuration to the landscape. This requires the managed infrastructure to be completely projectable via declarative code. Additionally, we need an operator that monitors the state of the Git repository and applies the changes to the infrastructure.

Especially with Kubernetes, GitOps can be easily adopted. There are multiple open source tools that can be used to manage Kubernetes clusters with GitOps - the most common ones are Argo CD and Flux CD. Both tools are very similar in their approach and can be used to manage Kubernetes clusters. Since Argo CD provides a great integrated web UI that also visualizes the deployments and sync operations in a very nice way, I will focus on Argo CD in this post.

Git Repository Setup#

So when we want to have all our configuration in Git, how and where is the best place to put it? Some people tend to write Kubernetes configuration directly beneath the source code, but I have a clear answer for that: Throw everything into a single and separate repository. Even if you have a monorepo for your source code, a clean separation between code and configuration just makes your life easier. There is a more detailed explanation in the Argo CD Best Practices if you're interested in the benefits.

Cluster Setup#

The first question to be answered is: How does our cluster setup looks like? The answer heavily depends on the scope of the application and the budget. For small to medium sized landscapes, it is often sufficient to have a single Kubernetes cluster. You can just deploy Argo CD in the same cluster as the deployments it manages - likely in an own namespace.

However, for larger landscapes that also must fulfil enterprise compliance requirements, it is often a good idea to have multiple clusters. This allows the separation of different environments physically from each other to have a strict separation of concerns. Also access control is easier to handle when having multiple clusters, although Kubernetes RBAC is also a very powerful tool to manage access control within one cluster. Additionally, having multiple clusters also enables customers to have a service that is hosted in their region. Especially for enterprise applications that must comply with local data privacy requirements, this can become very important.

Nevertheless, whatever landscape setup it is - we can always use Argo CD instance to manage all clusters. In the best case, we have an own cluster that only runs Argo CD (and maybe other related components). With having this in place, it's easy to define the requirements for each cluster independently. Although an outage of Argo CD is not business-critical (since it only means you cannot update the configuration via Git), it is still a good idea to have an independant high availability setup.

We can connect Argo CD with other clusters and also filter the resources by the cluster. That means all the configuration is at one place, we don't have to switch tools / instances for different clusters and just have to manage access to one instance. Overall, it reduces the complexity since no direct operation on the clusters is required anymore and everything is controlled by a single system.

Concepts of Argo CD#

So how do we actually configure Argo CD to deploy something? Well, this post is not a hands-on step-by-step guide how to setup Argo CD. If you're looking for that, just go to the documentation ;). However, I still want to give a brief overview over the concepts Argo CD uses.

Applications#

An application in Argo CD is set of Kubernetes resources that are deployed together. This can be a single deployment, a whole Helm chart or even a whole Git repository. Normally, we want to have each component of the service in a separate Argo CD application. This allows having a clear separation of concerns since synchronization and access control happens on application level.

Application Sets#

Now we know what applications are and created ones for all components, great! But what is with multiple environments that have the same components? We don't want to duplicate the configuration for dev, staging, prod, whatever right? This is where application sets come into play. An application set is basically a template that dynamically creates applications. In the next chapter, we will take a look at how we can use application sets to create applications for multiple environments, maybe even distributed over multiple clusters.

Projects#

Projects are a virtual grouping of applications that share the same configuration. They configure access, source repositories, clusters, synchronization windows and more for their applications. Each application must belong to a project - there's always the default project that can be used when there's no custom one.

Repository Structure#

The way how we now stucture the configuration in the Git repository is really important and should be well thought out. It's hard to change the structure later on and requires lots of effort, so it's better to find a good setup from the beginning. Unfortunately, I can't give you a one-size-fits-all solution here, but I can explain some approaches that I have seen in the wild and that have their individual benefits.

Multiple branches#

One approach to handle multiple environments is to have a branch for each environment or cluster. The advantage is, that we can adjust the configuration of each environment individially, but additionally have the ability to merge configuration between branches. This comes in handy when we make changes first to a development cluster, but then want to ship the exact same config to other environments.

In Argo CD applications, the branch from the GitOps repository can simply be specified via the spec.source.targetRevision property of the Application Kubernetes resource.

This might sound really practical at first glance, but in practice the merging between branches can get a bit tedious. Besides possible merge commits, we might not want to merge all of the changes, just some of them - and there it starts getting annoying. In practice, this is not always that smooth, that's the reason why people tend to choose other approaches that are explained in the next sections.

App of Apps#

The "App of Apps" concept for Argo CD refers to a way of organizing and managing applications within a Argo CD instance. In this approach, a "parent" application is used to manage and coordinate the deployment of multiple "child" applications, allowing for a more centralized and organized way of managing the applications. We can then manage multiple applications at one place - instead of managing each application separately.

That means a file for each application for each environment is required. This approach can be combined with the previous ones, [multiple branches](#mul and Kustomize.

Application Sets with Matrix Generators#

The App of Apps concept is great, but we still have duplications when having multiple environments since there needs to be an application for each one. A solution to reduce boilerplate and duplicated code is to use Application Sets.

An Application Set is an abstraction over applications that allows to dynamically generate applications based on some data. The set has an application template that is parametrizable. Then we need some kind of data source that is used to generate the applications, it can be just hardcoded but also a path to JSON files in a git repository.

This approach assumes that the only diffenrence between the environments is the configuration of the deployments. So when we have Helm charts, we have just different value.yaml files for each environment. However, this is also the downside of this concept, at least if we replicate environments this way: If we change the Kubernetes configuration, we change it for all environments instantly. Changes for individual components can only be rolled out by updating the respective value.yaml file! If there are frequent changes to the Kubernetes configuration, you might not want to solve the environment replication this way.

A repository using this approach could look like this:

applications/
    sets/
        application-set-components.yaml
    root-deployment.yaml
charts/
    component-a/
        ...
    component-b/
clusters/
    staging/
        values/
            environment.yaml
            component-a.yaml
            component-b.yaml
        config.json
    production/
        values/
            environment.yaml
            component-a.yaml
            component-b.yaml
        config.json
    ...

The root-deployment.yaml just deploys all Application Sets, it could look like the following example:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: application-sets
  namespace: argocd
spec:
  project: default
  source:
    repoURL: git@github.com:code-specialist/gitops.git  # GitOps repository
    targetRevision: HEAD
    path: applications/sets
  destination:
    # adds app in same cluster that Argo runs in, but sets can still be in other clusters
    server: https://kubernetes.default.svc
    namespace: argocd

An Application Set would then look like this:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: components
spec:
  generators:
    - matrix:
        generators:
          - git:
              repoURL: git@github.com:code-specialist/gitops.git  # GitOps repository
              revision: HEAD
              files:
                - path: clusters/*/config.json  # will generate an application for each occurrence / match of this file
          - list:
              elements:
                - component: component-a
                - component: component-b
  template:
    metadata:
      name: "{{component}}"
      namespace: argocd  # this is just the namespace of the Argo CD application, NOT of the Helm chart
    spec:
      project: "{{cluster.name}}"  # you can also separate apps in different environments via projects
      source:
        path: "charts/{{component}}"  # Helm chart directory
        repoURL: git@github.com:code-specialist/gitops.git  # GitOps repository
        targetRevision: HEAD
        helm:
          valueFiles:
            - "../../clusters/{{cluster.name}}/values/environment.yaml"  # values that are used by whole environment
            - "../../clusters/{{cluster.name}}/values/{{component}}.yaml"  # component-specific values
      destination:
        server: "{{cluster.url}}"
        namespace: components  # namespace where the Helm chart is installed

With this setup, Argo CD will automatically spin up new applications when clusters are added to the clusters directory, and also updates the configuration when value files are updated.

Kustomize#

Kustomize is a tool that allows writing a single "base" configuration for the deployments and then overwrite it with customizations for each environment. It is also supported natively by Argo CD, you can find for more information on the corresponding documentation page.

This approach is the most flexible and powerful of the described ones in this post. It allows configuring each environment individually without having to duplicate the whole configuration. It requires a bit more boilerplate code than the other ones, but especially in large and complex landscapes this will be acceptable. However, you first have to dig a bit into Kustomize to be able to use it properly 😉

A project structure with this approach could look like this:

base/
  component-a/
    kustomize.yaml
    ...
  component-b/
    kustomize.yaml
    ...
overlays/
  staging/
    kustomize.yaml
    ...
  production/
    kustomize.yaml
    ...
kustomize.yaml

With using Kustomize, we also can also renounce using Helm. Although it is possible to combine both tools, Kustomize in combination with GitOps covers everything Helm can do - and even more. The templating is replaced by the overlays and the Helm lifecycle (upates, rollbacks, ...) is covered by GitOps. When we want to rollback a change, just revert the latest commit!

Operating the Landscape#

So after we've setup the Argo CD setup and have all clusters connected and runnning, I hope you're starting to see the benefits of GitOps. But after the initial setup, we probably want to go live and have production environments running. This is where the real fun begins.

So how can we deploy changes in an agile, reliable and smart way? Since we're using Git, we can just create new branches for the changes and open pull requests. This is just awesome, because we can do code reviews and collaborate. But not just that, we can also run pipelines that perform stuff like linting and policy checks. Furthermore - and now it gets really fancy - we could deploy the new configuration to a dedicated cluster for pull requests and run automated tests on it. Sure, that means a lot of effort to set this up and maintain this, but for huge enterprise landscapes where lots of engineers from different teams make frequent changes to the infrastructure configuration, this is just gold. On the other side, if that's not the case - just deploy the changes on dev first to verify and test ;)

Further Steps#

Haven't heard of CrossPlane yet? You should definitely have a look! Especially in large infrastructures and enterprises you want to automate as much as possible and have all of the infrastructure as code. This can get quite complex when you have lots of different services, providers and platforms as part of the infrastructure. Usually, such configuration is done via Terraform, which is a great tool.

But what if you can go one step ahead and make it even better? With CrossPlane, you have a Kubernetes control plane that can manage configuration for you. It uses the Kubernetes operator pattern to synchronize the infrastructure with your configuration which you provide via custom Kubernetes resources. And even better - you can combine this with Argo CD and other Tools to also have this configuration as IaC in your GitOps repository!

CrossPlane is open-source and really easy to extend. So if you're interested, I can definitely recommend looking into it 😄

We've reached the end of this post, thanks for reading! If you have any questions, comments or further ideas and tips that can improve GitOps setups, feel free to post them in the comments!