# Flux CD Deep Dive

I've covered [FluxCD Vs Argo CD](https://ferrishall.dev/flux-cd-vs-argo-cd), and I've covered a [Deeper Dive into Argo CD](https://ferrishall.dev/deeper-dive-into-argo-cd) in previous blog posts, so it only seems fair to take a deeper look into Flux CD.

## Bootstrapping

To get started, first, we need to prep some things. Follow the [get started](https://fluxcd.io/flux/get-started/) to get the Flux CLI installed on your laptop and check your cluster is good to go.

Then we need to install Flux CD on our Kubernetes cluster, which is called [bootstrapping](https://fluxcd.io/flux/installation/bootstrap/github/). I created a new private repo in my personal GitHub, created a fine-grained personal access token (GitHub PAT) with permissions to read and write admin; the Flux docs state read-only, but I found I needed `Administration` -> `Access: Read and write` So Flux could create deploy keys to the repo, which, security-wise, is preferred. We don't want to store the GitHub PAT in the cluster, so the deploy key is an SSH key that Flux uses just for that repo. Permissions with the principle of least privilege.

```shell
flux bootstrap github --owner=ferrish07 --repository=fluxcd-blog-demo --branch=main --path=./clusters/flux-blog-cluster --personal --private=true --components-extra=image-reflector-controller,image-automation-controller --read-write-key=true
```

These extra flags are for installing the CRDs for image automation, which we'll look at later and the read-write key so Flux can write commits to our repo. If you don't want to do that, leave them out. You can re-bootstrap with these flags if you like.  
`--components-extra=image-reflector-controller,image-automation-controller --read-write-key=true`

You should then see an output in your terminal with the last being all your components being healthy

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/8719c29a-e15e-4894-b8fd-add447dd1fa6.png align="center")

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/10004219-1f64-4a4c-a6fa-0ffbc36cd272.png align="center")

A quick `kubectl get gitrepository -n flux-system` should also display the URL of the repo and that it's in a ready=True state

So what just happened?! The Bootstrap creates the Flux CD components on the cluster and then reconciles the cluster using the auth token and the deploy key.

## Now what?! Spin some pods up!

Look in your Git repo, and you'll now see some directories created by Flux as part of the initial reconciliation:

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/cb1ad184-8c25-4867-b93c-1fddb51070ef.png align="center")

Flux isn't watching over the whole repo, though; it'll watch out for anything in the `clusters/flux-blog-cluster` directory as an entry point.

Let's test it's all working, I'll just drop a deploy.yaml in that spins up a deployment running 2 pods of nginx.

Now we'll tell Flux to reconcile the repo (I'm impatient and don't want to wait 5-10 mins; you can change the time).

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/fa6716f5-590f-469e-8458-c104b9bbd107.png align="center")

Flux will grab the latest changes to the repo, and all being well, and you didn't spell anything wrong `namespace: defailt` Like I definitely didn't..... you'll get 2 new pods from the new deployment. GitOps!

## "namespaces defailt not found" - Learning point and troubleshooting!

You can check the logs if you don't see what you're expecting

`kubectl logs -n flux-system deployment/kustomize-controller`

```shell
{"level":"error","ts":"2026-04-21T19:13:45.143Z","msg":"Reconciliation failed after 804.204606ms, next try in 10m0s","controller":"kustomization","controllerGroup":"kustomize.toolkit.fluxcd.io","controllerKind":"Kustomization","Kustomization":{"name":"flux-system","namespace":"flux-system"},"namespace":"flux-system","name":"flux-system","reconcileID":"8eda1147-fa64-464b-905e-426661417132","revision":"main@sha1:46fb854809f51acad4ca77c5f95322406a06b952","error":"Deployment/defailt/test-flux-nginx not found: namespaces \"defailt\" not found\n"}
```

It'll help you find out why the pods didn't create, in my case, a misspelling `default` of the namespace....

## Kustomize

Let's step this up. Currently, sticking our `deployment.yaml` in the root of the repo for reconciliation is a bit messy and a bit basic.

Let's get organised and use Kustomize to deploy to multiple environments.

We'll create an `apps` directory which will have a `base` and `overlays` directories inside. In the overlays directory, we'll have our environments `production` and `staging` directories.

Quick intro into what we're doing here, we want to have our Kubernetes manifests deployed to different environments, which will run on this cluster, and they might have some differences, like what version of container they are running, how many replicas, etc., that sort of thing.

Now we don't want to have to manage loads of different manifests, copy and pasting from different repos or directories, we'll end up with drift and errors.

What Kustomze will do is keep our "base" manifests, so for this example, deployments.yaml as a template and the "overlays" is where we overlay the differences of the environments. Staging might have 2 replicas, and production might have 4, that sort of thing.

The overlays don't contain the full deployment manifest YAML code, just the differences that we want to "overlay" on top of the base. Hopefully that makes sense

Here's the directory structure in our repo:

```shell
.
├── apps
│   ├── base
│   │   ├── deployment.yaml
│   │   └── kustomization.yaml
│   └── overlays
│       ├── production
│       │   ├── kustomization.yaml
│       │   └── namespace.yaml
│       └── staging
│           ├── kustomization.yaml
│           └── namespace.yaml
├── clusters
│   └── flux-blog-cluster
│       └── flux-system
│           ├── gotk-components.yaml
│           ├── gotk-sync.yaml
│           └── kustomization.yaml
└── README.md
```

Let's get some more manifest up so you can follow along.

`apps/base/deployment.yaml`:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kustomize-ginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kustomize-nginx
  template:
    metadata:
      labels:
        app: kustomize-nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
```

And what the `apps/overlays/staging/namespace.yaml` files look like:

```yaml
cat apps/overlays/staging/namespace.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: staging
```

and the `apps/overlays/staging/kustomize.yaml` (Remember to do the same with production, just change staging namespace, etc., to production)

```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
  - ./namespace.yaml
namespace: staging
labels:
  - includeSelectors: true
    pairs:
      env: staging
```

A quick test of `kubectl apply -k apps/overlays/staging` Will show us if what we've done works:

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/1038ea70-3b3a-4289-ae1f-e6e873408ca1.png align="center")

The staging namespace was created, and a pod from the deployment. Let's make sure the prod manifests work too:

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/60953b89-1e94-4c7d-baf2-85ab812b4039.png align="center")

Looking good!

But!

We know Flux CD works; we want to incorporate what we've done with Flux and watch the magic happen! This is where things get interesting.

Let's delete the manually created deployments so we can get Flux to do it:

```shell
kubectl delete -k apps/overlays/production/
kubectl delete -k apps/overlays/staging/
```

## Configure Flux for Kustomize

First, let's answer a question:

"I already set up a Git repo when I bootstrapped Flux CD, surely if I just push these changes, Flux will just deploy it all as it did with the original deploy.yaml?"

Not quite... we created the GitRepository source and essentially told Flux to watch for everything in the `./clusters/flux-blog-cluster` directory, so it won't see the new `./apps` and its contents.

A Flux CD Kustomize source needs to be configured for staging and production. We want the same base code, but we want the environmental differences to be separate; we want Flux to see 2 environments, not a single environment. Hopefully that makes sense; if it doesn't, follow along and hopefully you'll soon see the benefits.

Let's create the staging Kustomization source configuration file for Flux:

```shell
flux create kustomization apps-staging \
  --target-namespace=staging \
  --source=flux-system \
  --path="./apps/overlays/staging" \
  --prune=true \
  --interval=1m \
  --export > clusters/flux-blog-cluster/apps-staging.yaml
```

This will output what Flux should look for using Kustomize, the `clusters/flux-blog-cluster/apps-staging.yaml` file should look like this:

```shell
cat clusters/flux-blog-cluster/apps-staging.yaml
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: apps-staging
  namespace: flux-system
spec:
  interval: 1m0s
  path: ./apps/overlays/staging
  prune: true
  sourceRef:
    kind: GitRepository
    name: flux-system
  targetNamespace: staging
```

Do the same for production, add and commit the new Kustomize YAML files and the new Flux CD Kustomizations to your repo, and all being well, you should see it reconcile by running `flux get kustomizations --watch`

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/f7300418-049d-462e-a684-9dfa5c55bd20.png align="center")

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/0dd792e0-35b6-4a1b-a164-4e5cd5a24539.png align="center")

Great! Now that's the initial configuration done, let's try some changes where we update and make some changes to staging without making too many changes to the base YAML and, importantly, not upsetting production!

Let's change the Nginx image, let's pin it to a tag and not use latest anymore; it will still be `nginx:latest` in the `apps/base/deployment.yaml` But we're configuring our staging environment with overlays, exactly what they're used for!

First, let's test in staging `apps/overlays/staging/kustomization.yaml`:

```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
  - ./namespace.yaml
namespace: staging
labels:
  - includeSelectors: true
    pairs:
      env: staging
images: # We're adding this images block
  - name: nginx
    newName: nginx
    newTag: 1.30.0 # This will overwrite the nginx:latest
```

This adds a kind of "shortcut" to find and make a change to images, now you might be thinking "Why did we use a `patch` before? With the replicas?"

A great question!

We're changing the image, very matter of fact: find the image where name = nginx and give it a new tag. To the point and straight forward and easier to read. Acts very much like a transformer.

Patch allows you to be much more complex and granular, almost surgical, if you will. Changing resource limits, the contents of a ConfigMap or replicas, patching is better to use.

Something very straightforward, almost a global find and replace, use the images transformer.

Let's push the changes and watch for the reconciliation:

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/8c2f5482-ab6f-4cc6-9fc3-5710f8f109c7.png align="center")

And describe the deployment to check for the change:

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/442752ff-7887-4be4-b738-6821a7a8c44c.png align="center")

The staging `kustomize-ginx` deployment container image has been updated to 1.30.0, and production remains untouched!

## Alerting, image automation and security

So far, we have gone through giving or telling Flux CD to reconcile and deploy some YAML, we look at logs or check kubectl, and thankfully, we find it's deployed (unless you fat finger and spell default wrong.....)

But how do we know when or if Flux CD can't deploy our manifest? What if we got something wrong and it gets through the peer review on the PR (It happens...), we don't want to have to hop on a terminal and check every time, especially as this takes off and we merge to master or release 50-100+ times a day!

That's what this section will cover, just some options we have to gather metrics, logs, observability and just see what's happening with Flux to help us troubleshoot should the worst happen.

## Slack alerting

I have a free Slack account, and I use this for testing integrations, etc., and also have some Grafana alerts for when my various homelab "production" services quietly die or stop working, like Pi-Hole, podfetcher, that sort of thing, so I have a channel I can use already. You'll need a Slack account or org, whatever it's called and a channel with a webhook.

First, I'll create a secret on the cluster so Flux knows what the URL is:

```shell
kubectl create secret generic slack-url 
--from-literal=address=https://hooks.slack.com/services/SLACK_CHANNEL_WEBHOOK -n flux-system
```

Then we tell Flux CD to [create the provider](https://fluxcd.io/flux/cmd/flux_create_alert-provider/) we'll use `clusters/flux-blog-cluster/slack-provider.yaml`

```yaml
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
  name: slack
  namespace: flux-system
spec:
  type: slack
  secretRef:
    name: slack-url
```

Then we'll create the actual alert:

```yaml
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
  name: apps-alert
  namespace: flux-system
spec:
  providerRef:
    name: slack
  eventSeverity: info
  eventSources:
    - kind: Kustomization
      name: apps-staging
    - kind: Kustomization
      name: apps-production
```

Push the changes, and they should then be applied:

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/fbae4c79-6a85-4b7b-8260-e231975b3594.png align="center")

Let's push a broken change to test the alert!

```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
  - ./namespace.yaml
namespace: production
labels:
  - includeSelectors: true
    pairs:
      env: production
   images: # This indentation should cause some havoc!
  - name: nginx
    newName: nginx
    newTag: 1.30.0

patches:
  - target:
      kind: Deployment
      name: kustomize-ginx
    patch: |-
      - op: replace
        path: /spec/replicas
        value: 3
```

Uh Oh!

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/e217068f-f68b-423b-ba12-9f28cb4acb0f.png align="center")

But what if we don't want to watch the terminal, or even worse, we've gone for coffee or lunch?! Slack alerting!

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/c8e37267-a672-4c67-8207-bcba31f69d8d.png align="center")

An alert in green saying a configuration has been completed, and then the actual alert in red telling me I did YAML wrong.

Quick fix it!

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/5754aca5-0d69-4adc-8b22-e248503960bd.png align="center")

Phew!

Thankfully, the Slack alert tells me what went wrong so I can investigate and push a fix. Slack then alerts the rest of the team that everything's sorted.

Now, there are loads of opinions on what you should alert on, too many alerts create too much alert noise, which can be exhausting and hide things that truly need our attention, so alert as you see best.

But now we know how to add some alerting to Flux CD!

## Image automation controllers

Breaking and fixing Flux to test the alerting, having to update the container images, has given me a great segway into our last topic of this Flux CD deep dive.

Flux CD has a controller which can watch container images, and when they get an update, Flux CD can automatically update that image in our Kubernetes YAML manifests files, reducing TOIL!

Bumping image SHA's and tags can become a lot of work when you start working with 10s, even 100s of images.

We can create an image policy Flux Cd object, which configures Flux Cd with a controller to watch for images and image policies, which we tell Flux which image we're interested in and semvar rules of what update policy we'd like for our image.

I'll configure our staging environment to keep the nginx image up to date automatically, first lets create the image repository and policy in `apps/overlays/staging/image-policy.yaml`:

```yaml
apiVersion: image.toolkit.fluxcd.io/v1
kind: ImageRepository
metadata:
  name: nginx
  namespace: staging
spec:
  image: nginx
  interval: 5m
---
apiVersion: image.toolkit.fluxcd.io/v1
kind: ImagePolicy
metadata:
  name: nginx-policy
  namespace: staging
spec:
  imageRepositoryRef:
    name: nginx
  policy:
    semver:
      range: '>=1.20.0 <1.31.0'
```

I then tell Flux CD what environment I want it to automatically "bump".

In `apps/overlays/staging/kustomization.yaml` I'll add a comment marker on the line of the `newTag` as that's what I want the image automater to look for, then and add the image-policy and image-automation files to the `resources`:

```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
  - ./namespace.yaml
  - ./image-policy.yaml
  - ./image-automation.yaml
namespace: staging
labels:
  - includeSelectors: true
    pairs:
      env: staging
images:
  - name: nginx
    newName: nginx
    newTag: 1.20.0 # {"$imagepolicy": "staging:nginx-policy:tag"}
```

This is pointing to the Flux CD `imagepolicy` in the namespace where that policy lives and the name of the policy `nginx-policy`.

Finally, let's add the automation to the `apps/overlays/staging/image-automation.yaml`. The `ImageUpdateAutomation` I'm configuring for the staging namespace (more on this below) with a reference of the GitHub repo that it is to write the commits to, the branch, and you can configure the [commit message](https://fluxcd.io/flux/guides/image-update/#configure-the-commit-message) to be as clever or complex as you like. We'll keep it simple for now:

```yaml
apiVersion: image.toolkit.fluxcd.io/v1
kind: ImageUpdateAutomation
metadata:
  name: staging-automation
  namespace: staging
spec:
  interval: 1m
  sourceRef:
    kind: GitRepository
    name: flux-system
    namespace: flux-system
  git:
    checkout:
      ref:
        branch: main
    commit:
      author:
        email: fluxcdbot@users.noreply.github.com
        name: fluxcdbot
      messageTemplate: 'chore: flux-bot is updating container images'
  update:
    path: ./apps/overlays/staging
    strategy: Setters
```

**Note**: The path is pointing at, we're configuring Flux CD to check for the markers here.

I have added this file to the staging overlays. On second thought, as it's technically "Infrastructure logic" because it's writing back to our git repo, this file might be better placed in the cluster directory `clusters/flux-blog-cluster/` because the `image-automation-controller` (the worker) runs in `flux-system` namespace.

If the automation object is in`staging`, You need to make sure it has the permissions to access cross namespace. By default `image-automation-controller` has cluster-wide permissions, but the `GitRepository` it needs to reference is usually in `flux-system`. Keeping the automation object in `flux-system` (the cluster folder) avoids complex RBAC issues. Just a gotcha to look out for if you run into issues.

Let's commit and push, run `flux reconcile ks flux-system` if you're impatient like me.

After a short wait, we should see the image reflector scanning and talking to DockerHub:

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/1b6de0a8-68ca-4e3e-86bb-58cc13bf313a.png align="center")

`flux reconcile image update staging-automation -n staging` and `flux get image update -n staging` to force the change, and we should see.....

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/d61feefb-a4a1-4aac-a64b-eda7e7778a5e.png align="center")

The bumped version of the staging nginx deployment!

We can check that the deployment has been updated with the newly committed image tag:

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/95b6ed30-d845-4061-82db-944e51c1dc66.png align="center")

Let's try it again and watch more closely. I'll update staging to something earlier like 1.21.0, commit and push:

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/ff548640-a3cc-4be2-b320-4a67bc574846.png align="center")

Following the logs, we can see the imageupdateautomation kicking in:

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/7fe6f707-1524-42a2-b654-91260aec977a.png align="center")

We can see the events that have happened when we re-deployed to nginx.1.21.0 and when the `imageupdateautmoation` brings it to the preferred 1.30.0 version:

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/069febe1-03fb-4e62-a7d7-63156ce0e959.png align="center")

And the commit history:

![](https://cdn.hashnode.com/uploads/covers/62b4bdddb86f939ac81b1228/e212589f-8765-49ce-8f8d-1d1abf9cb39d.png align="center")

PRs would be nicer, something I'll be looking at next, as you can configure Flux to commit and create a [PR using GitHub actions](https://fluxcd.io/flux/use-cases/gh-actions-auto-pr/), but that's for another blog post. This one's getting long enough as it is! But for our staging demo environment, this will do for now.

## Security considerations

Now, we've been working on a homelab-style Kind cluster, which is fine for finding your feet and trying Flux CD out. We should think about some security considerations, as we don't want Flux CD or, really, our Kube cluster to just deploy anything it's been told to.

So here are a couple of Flux CD security topics to think about:

## Policy as Code

Automated commits for updating images as they are updated are fantastic for reducing TOIL, but blindly trusting upstream images is a massive security risk.

What if a vulnerable image slips through? Integrating **Policy as Code** tools like [**Kyverno**](https://kyverno.io/) or [**Open Policy Agent (OPA) Gatekeeper**](https://open-policy-agent.github.io/gatekeeper/website/) allows you to set strict cluster guardrails. You can write admission policies that say: "Flux is allowed to update this deployment, but the container image must be signed by Cosign and must not contain any critical vulnerabilities." The policy admission policy will intercept and either allow or deny the deployment object. Admission controllers can be extended using OPA or Kyverno to validate and/or mutate. More info [here](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#admission-control-extension-points), but it gets really interesting here, where your cluster takes on a more self-service platform.

Allowing us to use our Kubernetes cluster as a platform for developers and operations, but we've set the guardrails which prevent configured images, tags or even enforce coding standards, such as objects must have labels for the environment, team, etc

Falco works similarly, with a stronger focus on runtime security within the cluster. definitely worth checking out (shameless previous blog post [Getting started with Falco on GKE](https://ferrishall.dev/getting-started-with-falco-security-tool-on-gke)).

## Isolation & RBAC

In our demo, we ran everything with fairly broad, cluster-wide permissions. In an enterprise shared cluster, you must enforce [**Tenant Isolation**.](https://fluxcd.io/flux/installation/configuration/multitenancy/) Flux handles this beautifully using `Receiver` architectures and **ServiceAccount Impersonation**. You can restrict Flux so that the Kustomization managing the `staging` namespace *only* has permission to touch that specific namespace, preventing a developer from accidentally altering core cluster infrastructure or cross-contaminating production.

## Summary

If you've made it this far..... Thank you! I really enjoyed working through and writing this blog for my own self-learning.

I hope the walk-through worked for you, and you got some Flux CD deployments and some automated commits working, and most importantly, you feel more comfortable talking about and using Flux CD. That was my focus for this, now quite long blog post. I hope the deep dive also gives you some thinking topics and inspiration to try something else with Flux!

This blog post is by no means production-ready. I wrote these words myself, and for transparency, I used AI to peer review my technical work and writing. I have tested and deployed all the examples myself, and to the best I can re-create, they all worked for me.

If you have any suggestions, comments or find something that doesn't work, I'm open and happy for comments and constructive criticisms for self-development.
