Skip to main content

Command Palette

Search for a command to run...

Flux CD Deep Dive

Buckle up! This one has more and took longer than planned!

Published
16 min read
Flux CD Deep Dive
F

Infrastructure Engineer with a Linux SysAdmin and SRE & DevOps background, previously a Google Cloud authorised trainer, who's excited and enthusiastic about Kubernetes, IaC, CI/CD, DevOps and SRE!

Experienced project infrastructure lead, project technical lead, and former Google Cloud Authorized trainer. Guiding organisations on cloud adoption, DevOps and SRE implementation. Mentor to junior engineers and people looking to change careers from a non-technical background or looking to get back into tech.

I'm passionate about building and deploying Cloud native infrastructure, automation, driving change and empowering people in learning and development.

I've covered FluxCD Vs Argo CD, and I've covered a Deeper Dive into Argo CD in previous blog posts, so it only seems fair to take a deeper look into Flux CD.

Bootstrapping

To get started, first, we need to prep some things. Follow the get started to get the Flux CLI installed on your laptop and check your cluster is good to go.

Then we need to install Flux CD on our Kubernetes cluster, which is called bootstrapping. I created a new private repo in my personal GitHub, created a fine-grained personal access token (GitHub PAT) with permissions to read and write admin; the Flux docs state read-only, but I found I needed Administration -> Access: Read and write So Flux could create deploy keys to the repo, which, security-wise, is preferred. We don't want to store the GitHub PAT in the cluster, so the deploy key is an SSH key that Flux uses just for that repo. Permissions with the principle of least privilege.

flux bootstrap github --owner=ferrish07 --repository=fluxcd-blog-demo --branch=main --path=./clusters/flux-blog-cluster --personal --private=true --components-extra=image-reflector-controller,image-automation-controller --read-write-key=true

These extra flags are for installing the CRDs for image automation, which we'll look at later and the read-write key so Flux can write commits to our repo. If you don't want to do that, leave them out. You can re-bootstrap with these flags if you like.
--components-extra=image-reflector-controller,image-automation-controller --read-write-key=true

You should then see an output in your terminal with the last being all your components being healthy

A quick kubectl get gitrepository -n flux-system should also display the URL of the repo and that it's in a ready=True state

So what just happened?! The Bootstrap creates the Flux CD components on the cluster and then reconciles the cluster using the auth token and the deploy key.

Now what?! Spin some pods up!

Look in your Git repo, and you'll now see some directories created by Flux as part of the initial reconciliation:

Flux isn't watching over the whole repo, though; it'll watch out for anything in the clusters/flux-blog-cluster directory as an entry point.

Let's test it's all working, I'll just drop a deploy.yaml in that spins up a deployment running 2 pods of nginx.

Now we'll tell Flux to reconcile the repo (I'm impatient and don't want to wait 5-10 mins; you can change the time).

Flux will grab the latest changes to the repo, and all being well, and you didn't spell anything wrong namespace: defailt Like I definitely didn't..... you'll get 2 new pods from the new deployment. GitOps!

"namespaces defailt not found" - Learning point and troubleshooting!

You can check the logs if you don't see what you're expecting

kubectl logs -n flux-system deployment/kustomize-controller

{"level":"error","ts":"2026-04-21T19:13:45.143Z","msg":"Reconciliation failed after 804.204606ms, next try in 10m0s","controller":"kustomization","controllerGroup":"kustomize.toolkit.fluxcd.io","controllerKind":"Kustomization","Kustomization":{"name":"flux-system","namespace":"flux-system"},"namespace":"flux-system","name":"flux-system","reconcileID":"8eda1147-fa64-464b-905e-426661417132","revision":"main@sha1:46fb854809f51acad4ca77c5f95322406a06b952","error":"Deployment/defailt/test-flux-nginx not found: namespaces \"defailt\" not found\n"}

It'll help you find out why the pods didn't create, in my case, a misspelling default of the namespace....

Kustomize

Let's step this up. Currently, sticking our deployment.yaml in the root of the repo for reconciliation is a bit messy and a bit basic.

Let's get organised and use Kustomize to deploy to multiple environments.

We'll create an apps directory which will have a base and overlays directories inside. In the overlays directory, we'll have our environments production and staging directories.

Quick intro into what we're doing here, we want to have our Kubernetes manifests deployed to different environments, which will run on this cluster, and they might have some differences, like what version of container they are running, how many replicas, etc., that sort of thing.

Now we don't want to have to manage loads of different manifests, copy and pasting from different repos or directories, we'll end up with drift and errors.

What Kustomze will do is keep our "base" manifests, so for this example, deployments.yaml as a template and the "overlays" is where we overlay the differences of the environments. Staging might have 2 replicas, and production might have 4, that sort of thing.

The overlays don't contain the full deployment manifest YAML code, just the differences that we want to "overlay" on top of the base. Hopefully that makes sense

Here's the directory structure in our repo:

.
├── apps
│   ├── base
│   │   ├── deployment.yaml
│   │   └── kustomization.yaml
│   └── overlays
│       ├── production
│       │   ├── kustomization.yaml
│       │   └── namespace.yaml
│       └── staging
│           ├── kustomization.yaml
│           └── namespace.yaml
├── clusters
│   └── flux-blog-cluster
│       └── flux-system
│           ├── gotk-components.yaml
│           ├── gotk-sync.yaml
│           └── kustomization.yaml
└── README.md

Let's get some more manifest up so you can follow along.

apps/base/deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kustomize-ginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kustomize-nginx
  template:
    metadata:
      labels:
        app: kustomize-nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest

And what the apps/overlays/staging/namespace.yaml files look like:

cat apps/overlays/staging/namespace.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: staging

and the apps/overlays/staging/kustomize.yaml (Remember to do the same with production, just change staging namespace, etc., to production)

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
  - ./namespace.yaml
namespace: staging
labels:
  - includeSelectors: true
    pairs:
      env: staging

A quick test of kubectl apply -k apps/overlays/staging Will show us if what we've done works:

The staging namespace was created, and a pod from the deployment. Let's make sure the prod manifests work too:

Looking good!

But!

We know Flux CD works; we want to incorporate what we've done with Flux and watch the magic happen! This is where things get interesting.

Let's delete the manually created deployments so we can get Flux to do it:

kubectl delete -k apps/overlays/production/
kubectl delete -k apps/overlays/staging/

Configure Flux for Kustomize

First, let's answer a question:

"I already set up a Git repo when I bootstrapped Flux CD, surely if I just push these changes, Flux will just deploy it all as it did with the original deploy.yaml?"

Not quite... we created the GitRepository source and essentially told Flux to watch for everything in the ./clusters/flux-blog-cluster directory, so it won't see the new ./apps and its contents.

A Flux CD Kustomize source needs to be configured for staging and production. We want the same base code, but we want the environmental differences to be separate; we want Flux to see 2 environments, not a single environment. Hopefully that makes sense; if it doesn't, follow along and hopefully you'll soon see the benefits.

Let's create the staging Kustomization source configuration file for Flux:

flux create kustomization apps-staging \
  --target-namespace=staging \
  --source=flux-system \
  --path="./apps/overlays/staging" \
  --prune=true \
  --interval=1m \
  --export > clusters/flux-blog-cluster/apps-staging.yaml

This will output what Flux should look for using Kustomize, the clusters/flux-blog-cluster/apps-staging.yaml file should look like this:

cat clusters/flux-blog-cluster/apps-staging.yaml
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: apps-staging
  namespace: flux-system
spec:
  interval: 1m0s
  path: ./apps/overlays/staging
  prune: true
  sourceRef:
    kind: GitRepository
    name: flux-system
  targetNamespace: staging

Do the same for production, add and commit the new Kustomize YAML files and the new Flux CD Kustomizations to your repo, and all being well, you should see it reconcile by running flux get kustomizations --watch

Great! Now that's the initial configuration done, let's try some changes where we update and make some changes to staging without making too many changes to the base YAML and, importantly, not upsetting production!

Let's change the Nginx image, let's pin it to a tag and not use latest anymore; it will still be nginx:latest in the apps/base/deployment.yaml But we're configuring our staging environment with overlays, exactly what they're used for!

First, let's test in staging apps/overlays/staging/kustomization.yaml:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
  - ./namespace.yaml
namespace: staging
labels:
  - includeSelectors: true
    pairs:
      env: staging
images: # We're adding this images block
  - name: nginx
    newName: nginx
    newTag: 1.30.0 # This will overwrite the nginx:latest

This adds a kind of "shortcut" to find and make a change to images, now you might be thinking "Why did we use a patch before? With the replicas?"

A great question!

We're changing the image, very matter of fact: find the image where name = nginx and give it a new tag. To the point and straight forward and easier to read. Acts very much like a transformer.

Patch allows you to be much more complex and granular, almost surgical, if you will. Changing resource limits, the contents of a ConfigMap or replicas, patching is better to use.

Something very straightforward, almost a global find and replace, use the images transformer.

Let's push the changes and watch for the reconciliation:

And describe the deployment to check for the change:

The staging kustomize-ginx deployment container image has been updated to 1.30.0, and production remains untouched!

Alerting, image automation and security

So far, we have gone through giving or telling Flux CD to reconcile and deploy some YAML, we look at logs or check kubectl, and thankfully, we find it's deployed (unless you fat finger and spell default wrong.....)

But how do we know when or if Flux CD can't deploy our manifest? What if we got something wrong and it gets through the peer review on the PR (It happens...), we don't want to have to hop on a terminal and check every time, especially as this takes off and we merge to master or release 50-100+ times a day!

That's what this section will cover, just some options we have to gather metrics, logs, observability and just see what's happening with Flux to help us troubleshoot should the worst happen.

Slack alerting

I have a free Slack account, and I use this for testing integrations, etc., and also have some Grafana alerts for when my various homelab "production" services quietly die or stop working, like Pi-Hole, podfetcher, that sort of thing, so I have a channel I can use already. You'll need a Slack account or org, whatever it's called and a channel with a webhook.

First, I'll create a secret on the cluster so Flux knows what the URL is:

kubectl create secret generic slack-url 
--from-literal=address=https://hooks.slack.com/services/SLACK_CHANNEL_WEBHOOK -n flux-system

Then we tell Flux CD to create the provider we'll use clusters/flux-blog-cluster/slack-provider.yaml

apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
  name: slack
  namespace: flux-system
spec:
  type: slack
  secretRef:
    name: slack-url

Then we'll create the actual alert:

apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
  name: apps-alert
  namespace: flux-system
spec:
  providerRef:
    name: slack
  eventSeverity: info
  eventSources:
    - kind: Kustomization
      name: apps-staging
    - kind: Kustomization
      name: apps-production

Push the changes, and they should then be applied:

Let's push a broken change to test the alert!

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
  - ./namespace.yaml
namespace: production
labels:
  - includeSelectors: true
    pairs:
      env: production
   images: # This indentation should cause some havoc!
  - name: nginx
    newName: nginx
    newTag: 1.30.0

patches:
  - target:
      kind: Deployment
      name: kustomize-ginx
    patch: |-
      - op: replace
        path: /spec/replicas
        value: 3

Uh Oh!

But what if we don't want to watch the terminal, or even worse, we've gone for coffee or lunch?! Slack alerting!

An alert in green saying a configuration has been completed, and then the actual alert in red telling me I did YAML wrong.

Quick fix it!

Phew!

Thankfully, the Slack alert tells me what went wrong so I can investigate and push a fix. Slack then alerts the rest of the team that everything's sorted.

Now, there are loads of opinions on what you should alert on, too many alerts create too much alert noise, which can be exhausting and hide things that truly need our attention, so alert as you see best.

But now we know how to add some alerting to Flux CD!

Image automation controllers

Breaking and fixing Flux to test the alerting, having to update the container images, has given me a great segway into our last topic of this Flux CD deep dive.

Flux CD has a controller which can watch container images, and when they get an update, Flux CD can automatically update that image in our Kubernetes YAML manifests files, reducing TOIL!

Bumping image SHA's and tags can become a lot of work when you start working with 10s, even 100s of images.

We can create an image policy Flux Cd object, which configures Flux Cd with a controller to watch for images and image policies, which we tell Flux which image we're interested in and semvar rules of what update policy we'd like for our image.

I'll configure our staging environment to keep the nginx image up to date automatically, first lets create the image repository and policy in apps/overlays/staging/image-policy.yaml:

apiVersion: image.toolkit.fluxcd.io/v1
kind: ImageRepository
metadata:
  name: nginx
  namespace: staging
spec:
  image: nginx
  interval: 5m
---
apiVersion: image.toolkit.fluxcd.io/v1
kind: ImagePolicy
metadata:
  name: nginx-policy
  namespace: staging
spec:
  imageRepositoryRef:
    name: nginx
  policy:
    semver:
      range: '>=1.20.0 <1.31.0'

I then tell Flux CD what environment I want it to automatically "bump".

In apps/overlays/staging/kustomization.yaml I'll add a comment marker on the line of the newTag as that's what I want the image automater to look for, then and add the image-policy and image-automation files to the resources:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
  - ./namespace.yaml
  - ./image-policy.yaml
  - ./image-automation.yaml
namespace: staging
labels:
  - includeSelectors: true
    pairs:
      env: staging
images:
  - name: nginx
    newName: nginx
    newTag: 1.20.0 # {"$imagepolicy": "staging:nginx-policy:tag"}

This is pointing to the Flux CD imagepolicy in the namespace where that policy lives and the name of the policy nginx-policy.

Finally, let's add the automation to the apps/overlays/staging/image-automation.yaml. The ImageUpdateAutomation I'm configuring for the staging namespace (more on this below) with a reference of the GitHub repo that it is to write the commits to, the branch, and you can configure the commit message to be as clever or complex as you like. We'll keep it simple for now:

apiVersion: image.toolkit.fluxcd.io/v1
kind: ImageUpdateAutomation
metadata:
  name: staging-automation
  namespace: staging
spec:
  interval: 1m
  sourceRef:
    kind: GitRepository
    name: flux-system
    namespace: flux-system
  git:
    checkout:
      ref:
        branch: main
    commit:
      author:
        email: fluxcdbot@users.noreply.github.com
        name: fluxcdbot
      messageTemplate: 'chore: flux-bot is updating container images'
  update:
    path: ./apps/overlays/staging
    strategy: Setters

Note: The path is pointing at, we're configuring Flux CD to check for the markers here.

I have added this file to the staging overlays. On second thought, as it's technically "Infrastructure logic" because it's writing back to our git repo, this file might be better placed in the cluster directory clusters/flux-blog-cluster/ because the image-automation-controller (the worker) runs in flux-system namespace.

If the automation object is instaging, You need to make sure it has the permissions to access cross namespace. By default image-automation-controller has cluster-wide permissions, but the GitRepository it needs to reference is usually in flux-system. Keeping the automation object in flux-system (the cluster folder) avoids complex RBAC issues. Just a gotcha to look out for if you run into issues.

Let's commit and push, run flux reconcile ks flux-system if you're impatient like me.

After a short wait, we should see the image reflector scanning and talking to DockerHub:

flux reconcile image update staging-automation -n staging and flux get image update -n staging to force the change, and we should see.....

The bumped version of the staging nginx deployment!

We can check that the deployment has been updated with the newly committed image tag:

Let's try it again and watch more closely. I'll update staging to something earlier like 1.21.0, commit and push:

Following the logs, we can see the imageupdateautomation kicking in:

We can see the events that have happened when we re-deployed to nginx.1.21.0 and when the imageupdateautmoation brings it to the preferred 1.30.0 version:

And the commit history:

PRs would be nicer, something I'll be looking at next, as you can configure Flux to commit and create a PR using GitHub actions, but that's for another blog post. This one's getting long enough as it is! But for our staging demo environment, this will do for now.

Security considerations

Now, we've been working on a homelab-style Kind cluster, which is fine for finding your feet and trying Flux CD out. We should think about some security considerations, as we don't want Flux CD or, really, our Kube cluster to just deploy anything it's been told to.

So here are a couple of Flux CD security topics to think about:

Policy as Code

Automated commits for updating images as they are updated are fantastic for reducing TOIL, but blindly trusting upstream images is a massive security risk.

What if a vulnerable image slips through? Integrating Policy as Code tools like Kyverno or Open Policy Agent (OPA) Gatekeeper allows you to set strict cluster guardrails. You can write admission policies that say: "Flux is allowed to update this deployment, but the container image must be signed by Cosign and must not contain any critical vulnerabilities." The policy admission policy will intercept and either allow or deny the deployment object. Admission controllers can be extended using OPA or Kyverno to validate and/or mutate. More info here, but it gets really interesting here, where your cluster takes on a more self-service platform.

Allowing us to use our Kubernetes cluster as a platform for developers and operations, but we've set the guardrails which prevent configured images, tags or even enforce coding standards, such as objects must have labels for the environment, team, etc

Falco works similarly, with a stronger focus on runtime security within the cluster. definitely worth checking out (shameless previous blog post Getting started with Falco on GKE).

Isolation & RBAC

In our demo, we ran everything with fairly broad, cluster-wide permissions. In an enterprise shared cluster, you must enforce Tenant Isolation. Flux handles this beautifully using Receiver architectures and ServiceAccount Impersonation. You can restrict Flux so that the Kustomization managing the staging namespace only has permission to touch that specific namespace, preventing a developer from accidentally altering core cluster infrastructure or cross-contaminating production.

Summary

If you've made it this far..... Thank you! I really enjoyed working through and writing this blog for my own self-learning.

I hope the walk-through worked for you, and you got some Flux CD deployments and some automated commits working, and most importantly, you feel more comfortable talking about and using Flux CD. That was my focus for this, now quite long blog post. I hope the deep dive also gives you some thinking topics and inspiration to try something else with Flux!

This blog post is by no means production-ready. I wrote these words myself, and for transparency, I used AI to peer review my technical work and writing. I have tested and deployed all the examples myself, and to the best I can re-create, they all worked for me.

If you have any suggestions, comments or find something that doesn't work, I'm open and happy for comments and constructive criticisms for self-development.