<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[ferrishall.dev]]></title><description><![CDATA[Learnings, findings and how-tos from a Platform/DevOps/Infra engineer. Previous consultant and Cloud technical trainer.
I enjoy learning new things and writing blog posts about them because nothing is more motivating than not looking stupid on the internet...]]></description><link>https://ferrishall.dev</link><generator>RSS for Node</generator><lastBuildDate>Thu, 16 Apr 2026 21:02:16 GMT</lastBuildDate><atom:link href="https://ferrishall.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Cilium ClusterMesh Deep Dive: Connecting Kubernetes Clusters with eBPF]]></title><description><![CDATA[I’ve been slow with the blog posts, I’ve been very busy getting to grips with new tools, systems, ways of working, and just trying to learn as much as I can since starting my new role at a new company 6 months ago and not making my head explode… All ...]]></description><link>https://ferrishall.dev/cilium-clustermesh-kubernetes-ebpf-guide</link><guid isPermaLink="true">https://ferrishall.dev/cilium-clustermesh-kubernetes-ebpf-guide</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[cilium]]></category><category><![CDATA[eBPF]]></category><category><![CDATA[Devops]]></category><category><![CDATA[cloudnative]]></category><category><![CDATA[networking]]></category><category><![CDATA[containers]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Wed, 04 Feb 2026 16:29:25 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1770213520688/cb7d2f87-ec90-4263-adee-4b3f3f1229e1.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I’ve been slow with the blog posts, I’ve been very busy getting to grips with new tools, systems, ways of working, and just trying to learn as much as I can since starting my new role at a new company 6 months ago and not making my head explode… All good fun!</p>
<p>Anyway, Cilium is something I looked into when I was studying for the <strong>CKS: Certified Kubernetes Security</strong> Certification and we’re using it at my new place, so it seems like a good time to get better hands on with it.</p>
<h2 id="heading-not-an-intro-introduction">Not an intro….. Introduction</h2>
<p>Now this isn’t meant to be an intro to Cilium, I’ll just say it’s an open source project that provides networking, security and observability for container orchestrations, Kubernetes etc.</p>
<p>But yeah, pretty much it’s much more feature rich cluster networking than the standard networking CNI offerings but also improves security and works using a new Linux kernel technology called eBPF, which enables the dynamic insertion of powerful security, visibility, and networking control logic into the Linux kernel.</p>
<p>eBPF is used to provide high-performance networking, multi-cluster and multi-cloud capabilities, advanced load balancing, transparent encryption, extensive network security capabilities, transparent observability, and much more.</p>
<p>In other words, traditional CNIs rely on the Linux kernel’s iptables or IPVS (IP Virtual Server)to shuffle packets around using basic IP addresses; Cilium uses eBPF to bypass these bottlenecks entirely.</p>
<p>Definitely not an intro….. And this blog post is definitely longer than I intended it to be, you know how it is when you get in the zone!</p>
<h2 id="heading-so-what-am-i-actually-doing-with-cilium">So what am I actually doing with Cilium?</h2>
<p>So I’m using Cilium to network 2 separate Kubernetes Clusters so I can loadbalance and failover requests for workloads between them, achieving “Multi Cloud” Kubernetes. I want to have my workloads spread across 2 separate clusters, potentially in 2 different cloud providers, with requests being able to respond locally or from the other cluster.</p>
<p>Anyway, back to networking 2 clusters together…. I’m using <a target="_blank" href="https://kind.sigs.k8s.io/">Kind</a> to spin up 2 clusters on my homelab Docker host (my Mac Pro ProxMox is maxed out and my larger Dell PowerEdge ProxMox server, which has my prod and test 4 node Kubeadm clusters on, is having hardware issues…).</p>
<p>You don’t have to configure 2 workers if you don’t have the resources. (I had to trim mine down to 1 worker each towards the end…)</p>
<p>For Cilium ClusterMesh to work, your <strong>podSubnet must not overlap</strong> across clusters, so make sure to configure them to be unique CIDRs.</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># My "GCP" Kind cluster gcp-cluster.yaml</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Cluster</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">kind.x-k8s.io/v1alpha4</span>
<span class="hljs-attr">name:</span> <span class="hljs-string">gcp-cluster</span>
<span class="hljs-attr">networking:</span>
  <span class="hljs-attr">disableDefaultCNI:</span> <span class="hljs-literal">true</span>
  <span class="hljs-attr">podSubnet:</span> <span class="hljs-string">"10.10.0.0/16"</span> <span class="hljs-comment"># Unique Pod Range</span>
  <span class="hljs-attr">serviceSubnet:</span> <span class="hljs-string">"10.11.0.0/16"</span>
<span class="hljs-attr">nodes:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">role:</span> <span class="hljs-string">control-plane</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">role:</span> <span class="hljs-string">worker</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">role:</span> <span class="hljs-string">worker</span>

<span class="hljs-comment"># My "AWS" Kind cluster aws-cluster.yaml</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Cluster</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">kind.x-k8s.io/v1alpha4</span>
<span class="hljs-attr">name:</span> <span class="hljs-string">aws-cluster</span>
<span class="hljs-attr">networking:</span>
  <span class="hljs-attr">disableDefaultCNI:</span> <span class="hljs-literal">true</span>
  <span class="hljs-attr">podSubnet:</span> <span class="hljs-string">"10.20.0.0/16"</span> <span class="hljs-comment"># Unique Pod Range</span>
  <span class="hljs-attr">serviceSubnet:</span> <span class="hljs-string">"10.21.0.0/16"</span>
<span class="hljs-attr">nodes:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">role:</span> <span class="hljs-string">control-plane</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">role:</span> <span class="hljs-string">worker</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">role:</span> <span class="hljs-string">worker</span>
<span class="hljs-attr">containerdConfigPatches:</span>
<span class="hljs-bullet">-</span> <span class="hljs-string">|
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true</span>
</code></pre>
<p>Workaround… I had some issues running all these on one Docker host. When it came to adding the second cluster with 3 nodes, I was getting errors.</p>
<p><code>I0125 16:24:56.130526 233 round_trippers.go:560] GET https://aws-cluster-control-plane:6443/api/v1/nodes/aws-cluster-worker?timeout=10s 404 Not Found in 3 milliseconds</code></p>
<p>So after some digging around, I had to add the containerd runtime config in the AWS cluster nodes <code>containerdConfigPatches</code> yaml to use the systemd cgroup driver. It appears the nodes weren’t joining and the nesting of containers and resources on a single Docker host proved too much. By adding <code>SystemdCgroup = true</code> It prevents the "double management" of resources. Without this, both systemd and the container runtime try to manage the same processes.</p>
<p>On my Ubuntu Docker host server, I had to increase <a target="_blank" href="https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files">inotify limits.</a> I’m guessing this is because we have a bunch of the same or similar applications or processes watching the same directories on the Docker host.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Increase inotify limits (KIND's recommendation for multi-cluster)</span>
sudo sysctl fs.inotify.max_user_watches=524288
sudo sysctl fs.inotify.max_user_instances=512
</code></pre>
<p>Kind often flakes out when running 4+ nodes on a single Linux host, so with these settings, we should be good to go!</p>
<h2 id="heading-creating-and-preparing-the-clusters">Creating and preparing the clusters</h2>
<p>Spin the clusters up <code>kind create cluster --config gcp-cluster.yaml</code> and <code>kind create cluster --config aws-cluster.yaml</code></p>
<p>And you should hopefully get…..</p>
<pre><code class="lang-bash">$ kubectl cluster-info --context kind-gcp-cluster
Kubernetes control plane is running at https://127.0.0.1:39765
CoreDNS is running at https://127.0.0.1:39765/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use <span class="hljs-string">'kubectl cluster-info dump'</span>.

$ kubectl cluster-info --context kind-aws-cluster
Kubernetes control plane is running at https://127.0.0.1:41725
CoreDNS is running at https://127.0.0.1:41725/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use <span class="hljs-string">'kubectl cluster-info dump'</span>.
</code></pre>
<pre><code class="lang-bash">$ kubectl get nodes
NAME                        STATUS     ROLES           AGE     VERSION
aws-cluster-control-plane   NotReady   control-plane   2m45s   v1.32.2
aws-cluster-worker          NotReady   &lt;none&gt;          2m30s   v1.32.2
aws-cluster-worker2         NotReady   &lt;none&gt;          2m30s   v1.32.2
ferris@micro-ubuntu:~/Documents/kind_cilium_clusters$ kubectl get nodes --context kind-gcp-cluster
NAME                        STATUS     ROLES           AGE   VERSION
gcp-cluster-control-plane   NotReady   control-plane   28m   v1.32.2
gcp-cluster-worker          NotReady   &lt;none&gt;          28m   v1.32.2
gcp-cluster-worker2         NotReady   &lt;none&gt;          28m   v1.32.2
</code></pre>
<p>Now, if you've created a cluster before, you’ve seen this before… Don’t panic!</p>
<p>We configured the clusters without any CNI because we’re going to be using Cilium!</p>
<h2 id="heading-installing-cilium">Installing Cilium</h2>
<p>Using <a target="_blank" href="https://docs.cilium.io/en/stable/installation/k8s-install-helm/">Helm</a>, I’ll install Cilium:</p>
<pre><code class="lang-bash">helm repo add cilium https://helm.cilium.io/
helm repo update

<span class="hljs-comment"># Install on GCP first</span>
helm install cilium cilium/cilium --version 1.18.6 \
  --namespace kube-system \
  --<span class="hljs-built_in">set</span> cluster.name=gcp-cluster \
  --<span class="hljs-built_in">set</span> cluster.id=1 \
  --<span class="hljs-built_in">set</span> ipam.mode=kubernetes \
  --<span class="hljs-built_in">set</span> operator.replicas=1 \
  --<span class="hljs-built_in">set</span> kubeProxyReplacement=<span class="hljs-literal">true</span> \
  --<span class="hljs-built_in">set</span> k8sServiceHost=$(kubectl get nodes --context kind-gcp-cluster -o jsonpath=<span class="hljs-string">'{.items[0].status.addresses[?(@.type=="InternalIP")].address}'</span>) \
  --<span class="hljs-built_in">set</span> k8sServicePort=6443 \
  --kube-context kind-gcp-cluster
</code></pre>
<p>When that’s done, we need to <a target="_blank" href="https://docs.cilium.io/en/stable/network/clustermesh/clustermesh/#shared-certificate-authority">configure some certificate</a> secrets for the second cluster. Cilium ClusterMesh won’t trust another Cilium node with different certificates from one cluster to another.</p>
<pre><code class="lang-bash">kubectl get secret -n kube-system cilium-ca -o yaml --context kind-gcp-cluster &gt; cilium-ca.yaml

<span class="hljs-comment"># Delete the context-specific metadata so we can apply it to AWS</span>
sed -i <span class="hljs-string">'/resourceVersion/d;/uid/d;/creationTimestamp/d;/namespace/d'</span> cilium-ca.yaml

<span class="hljs-comment"># Apply it to AWS</span>
kubectl apply -f cilium-ca.yaml -n kube-system --context kind-aws-cluster
</code></pre>
<p>I ran into a Helm issue when installing for the second AWS cluster:</p>
<p><code>Error: INSTALLATION FAILED: Unable to continue with install: Secret "cilium-ca" in namespace "kube-system" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "kube-system"</code></p>
<p>Turns out it was a Helm "ownership" conflict. Because I manually applied the <code>cilium-ca</code> Secret using <code>kubectl</code>Helm is refusing to take control of it because it's missing the labels and annotations that say, "This belongs to the Cilium Helm Chart."</p>
<p>Since we want the Secret to be there (it's the shared key that makes the mesh work), we need to "adopt" it into Helm's management.</p>
<pre><code class="lang-bash">kubectl annotate secret cilium-ca -n kube-system \
  meta.helm.sh/release-name=cilium \
  meta.helm.sh/release-namespace=kube-system \
  --context kind-aws-cluster

secret/cilium-ca annotated

kubectl label secret cilium-ca -n kube-system \
  app.kubernetes.io/managed-by=Helm \
  --context kind-aws-cluster

secret/cilium-ca not labeled
</code></pre>
<p>So let’s try again:</p>
<pre><code class="lang-bash">helm install cilium cilium/cilium --version 1.18.6 \
  --namespace kube-system \
  --<span class="hljs-built_in">set</span> cluster.name=aws-cluster \
  --<span class="hljs-built_in">set</span> cluster.id=2 \
  --<span class="hljs-built_in">set</span> ipam.mode=kubernetes \
  --<span class="hljs-built_in">set</span> operator.replicas=1 \
  --<span class="hljs-built_in">set</span> kubeProxyReplacement=<span class="hljs-literal">true</span> \
  --<span class="hljs-built_in">set</span> k8sServiceHost=$(kubectl get nodes --context kind-aws-cluster -o jsonpath=<span class="hljs-string">'{.items[0].status.addresses[?(@.type=="InternalIP")].address}'</span>) \
  --<span class="hljs-built_in">set</span> k8sServicePort=6443 \
  --kube-context kind-aws-cluster
</code></pre>
<p>These configurations for Helm are essentially saying single replicas for the operator because this is Kind, it’s a small cluster, so we will replace the kubeproxy with Cilium by configuring with <code>kubeProxyReplacement=true</code>.</p>
<p>In a multi-cluster world, Cilium identifies every Pod using a combination of its Namespace, Labels, and a <strong>Cluster ID</strong>. If both clusters have the same ID, Cilium's eBPF maps will collide, and you'll get "IP already exists" or routing loop issues.</p>
<p>Traditional firewalls use IP addresses. Cilium uses <a target="_blank" href="https://docs.cilium.io/en/stable/internals/security-identities/"><strong>Security Identities</strong></a>. When a packet leaves Cluster A, Cilium attaches a numerical identity to it. Cluster B checks its local BPF map for that ID, not the IP, to decide if the traffic is allowed. This is why the unique <a target="_blank" href="http://cluster.id"><code>cluster.id</code></a> is so important!</p>
<p>Make sure you <a target="_blank" href="https://docs.cilium.io/en/stable/gettingstarted/k8s-install-default/#install-the-cilium-cli">install Cilium CLI</a> (Probably should have done this first :shrug)</p>
<p>Enable cluster mesh on both clusters.</p>
<pre><code class="lang-bash">cilium clustermesh <span class="hljs-built_in">enable</span> --context kind-gcp-cluster --service-type NodePort
cilium clustermesh <span class="hljs-built_in">enable</span> --context kind-aws-cluster --service-type NodePort
</code></pre>
<p>Then connect them.</p>
<pre><code class="lang-bash">cilium clustermesh connect --context kind-gcp-cluster --destination-context kind-aws-cluster
</code></pre>
<p>And verify….</p>
<pre><code class="lang-bash">cilium clustermesh status --context kind-gcp-cluster
⚠️  Service <span class="hljs-built_in">type</span> NodePort detected! Service may fail when nodes are removed from the cluster!
✅ Service <span class="hljs-string">"clustermesh-apiserver"</span> of <span class="hljs-built_in">type</span> <span class="hljs-string">"NodePort"</span> found
✅ Cluster access information is available:
  - 172.18.0.2:32379
✅ Deployment clustermesh-apiserver is ready
ℹ️  KVStoreMesh is disabled

✅ All 3 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]

🔌 Cluster Connections:
  - aws-cluster: 3/3 configured, 3/3 connected
</code></pre>
<p>Looking good!</p>
<h2 id="heading-testing-it-out">Testing it out</h2>
<p>Deploy a workload to cluster 1.</p>
<p>Something like the echoserver, so we can test later. Here’s a snippet of what I used; change it how and what you want.</p>
<pre><code class="lang-bash">kubectl --context kind-gcp-cluster create deployment backend --image=k8s.gcr.io/echoserver:1.10
kubectl --context kind-gcp-cluster scale deployment backend --replicas=2
</code></pre>
<p>And a workload to cluster 2.</p>
<pre><code class="lang-bash">kubectl --context kind-aws-cluster create deployment backend --image=k8s.gcr.io/echoserver:1.10
kubectl --context kind-aws-cluster scale deployment backend --replicas=2
</code></pre>
<p>Then we need the global service.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">backend</span>
  <span class="hljs-attr">annotations:</span>
    <span class="hljs-attr">service.cilium.io/global:</span> <span class="hljs-string">"true"</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">ports:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-number">80</span>
    <span class="hljs-attr">targetPort:</span> <span class="hljs-number">8080</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">backend</span>
</code></pre>
<p><strong>Global Services</strong> must exist in both clusters with the exact same name/namespace for the mesh to "link" them. Apply to both clusters, then we test the functionality:</p>
<p>Deploy a tester pod to cluster 1.</p>
<pre><code class="lang-bash">kubectl --context kind-gcp-cluster run tester --image=alpine --restart=Never -- /bin/sh -c <span class="hljs-string">"apk add curl &amp;&amp; sleep 3600"</span>
</code></pre>
<p>Then we curl the service from the tester pod and see what and where the responses are:</p>
<pre><code class="lang-bash">kubectl --context kind-gcp-cluster <span class="hljs-built_in">exec</span> tester -- sh -c <span class="hljs-string">"for i in \$(seq 1 10); do curl -s backend | grep 'Hostname'; done"</span>
Hostname: backend-5974d998f8-47ddk
Hostname: backend-5974d998f8-bd5wg
Hostname: backend-5974d998f8-7brr9
Hostname: backend-5974d998f8-fcrcw
Hostname: backend-5974d998f8-47ddk
Hostname: backend-5974d998f8-fcrcw
Hostname: backend-5974d998f8-7brr9
Hostname: backend-5974d998f8-7brr9
Hostname: backend-5974d998f8-7brr9
Hostname: backend-5974d998f8-fcrcw

kubectl --context kind-gcp-cluster get pods
NAME                       READY   STATUS    RESTARTS   AGE
backend-5974d998f8-47ddk   1/1     Running   0          6m27s
backend-5974d998f8-7brr9   1/1     Running   0          6m27s
tester                     1/1     Running   0          43s
kubectl --context kind-aws-cluster get pods
NAME                       READY   STATUS    RESTARTS   AGE
backend-5974d998f8-bd5wg   1/1     Running   0          6m30s
backend-5974d998f8-fcrcw   1/1     Running   0          6m29s
</code></pre>
<p>We see results from backend pods from both clusters.</p>
<p>To make this clearer, I edited my deployments to include the cluster type/name in the pod’s name to make it more obvious</p>
<pre><code class="lang-bash">kubectl --context kind-gcp-cluster <span class="hljs-built_in">exec</span> tester -- sh -c <span class="hljs-string">"for i in \$(seq 1 20); do curl -s backend | grep 'Hostname'; done"</span>
Hostname: aws-backend-5c69d5f85-vxwx9
Hostname: aws-backend-5c69d5f85-vxwx9
Hostname: aws-backend-5c69d5f85-7f68c
Hostname: gcp-backend-57d9f789dd-hmqs6
Hostname: gcp-backend-57d9f789dd-hmqs6
Hostname: gcp-backend-57d9f789dd-hmqs6
Hostname: gcp-backend-57d9f789dd-dbbsp
Hostname: gcp-backend-57d9f789dd-dbbsp
Hostname: gcp-backend-57d9f789dd-dbbsp
Hostname: aws-backend-5c69d5f85-7f68c
Hostname: aws-backend-5c69d5f85-7f68c
Hostname: aws-backend-5c69d5f85-7f68c
Hostname: gcp-backend-57d9f789dd-dbbsp
Hostname: gcp-backend-57d9f789dd-hmqs6
Hostname: gcp-backend-57d9f789dd-hmqs6
Hostname: gcp-backend-57d9f789dd-dbbsp
Hostname: aws-backend-5c69d5f85-7f68c
Hostname: aws-backend-5c69d5f85-vxwx9
Hostname: gcp-backend-57d9f789dd-dbbsp
Hostname: aws-backend-5c69d5f85-vxwx9

kubectl --context kind-gcp-cluster get pods
NAME                           READY   STATUS    RESTARTS   AGE
gcp-backend-57d9f789dd-dbbsp   1/1     Running   0          4m47s
gcp-backend-57d9f789dd-hmqs6   1/1     Running   0          4m47s
tester                         1/1     Running   0          2m59s
kubectl --context kind-aws-cluster get pods
NAME                          READY   STATUS    RESTARTS   AGE
aws-backend-5c69d5f85-7f68c   1/1     Running   0          5m9s
aws-backend-5c69d5f85-vxwx9   1/1     Running   0          5m9s
</code></pre>
<p>We created the service with an annotation <code>service.cilium.io/global: "true"</code> This can be configured for other behaviours. Here are some examples and use cases I found interesting:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Annotation</strong></td><td><strong>Use Case</strong></td><td><strong>Result</strong></td></tr>
</thead>
<tbody>
<tr>
<td><code>service.cilium.io/global: "true"</code></td><td>Basic Mesh</td><td>Balanced traffic across all clusters.</td></tr>
<tr>
<td><code>service.cilium.io/affinity: "local"</code></td><td>Cost/Latency</td><td>Stay in the local cluster. Failover to remote only if local is down.</td></tr>
<tr>
<td><code>service.cilium.io/affinity: "remote"</code></td><td>Maintenance</td><td>Send all traffic to the other cluster (great for blue/green cluster upgrades).</td></tr>
<tr>
<td><code>service.cilium.io/shared: "false"</code></td><td>Isolation</td><td>This cluster's pods are "hidden" from the rest of the mesh. Not advertised to the other cluster at all.</td></tr>
</tbody>
</table>
</div><p>I’ll add the affinity local annotation to my service that I added to both clusters earlier and reapply it to both clusters again to reconfigure:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">backend</span>
  <span class="hljs-attr">annotations:</span>
    <span class="hljs-attr">service.cilium.io/affinity:</span> <span class="hljs-string">"local"</span> <span class="hljs-comment">## Failover if the local pods are down/unavailable</span>
    <span class="hljs-attr">service.cilium.io/global:</span> <span class="hljs-string">"true"</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">ports:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-number">80</span>
    <span class="hljs-attr">targetPort:</span> <span class="hljs-number">8080</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">backend</span>
</code></pre>
<pre><code class="lang-bash">kubectl <span class="hljs-built_in">exec</span> tester --context kind-gcp-cluster -- sh -c <span class="hljs-string">"for i in \$(seq 1 5); do curl -s backend | grep 'Hostname'; done"</span>

Hostname: gcp-backend-57d9f789dd-8nwss
Hostname: gcp-backend-57d9f789dd-s7kp5
Hostname: gcp-backend-57d9f789dd-8nwss
Hostname: gcp-backend-57d9f789dd-s7kp5
Hostname: gcp-backend-57d9f789dd-8nwss
</code></pre>
<p>The responses are staying local to the cluster as preferred, configured with the service annotation.</p>
<p>If I simulate an outage on the gcp-cluster and run the tester pod again, which is on the gcp-cluster:</p>
<pre><code class="lang-bash">kubectl --context kind-gcp-cluster scale deployment gcp-backend --replicas 0
deployment.apps/gcp-backend scaled

kubectl --context kind-gcp-cluster get pods
NAME     READY   STATUS    RESTARTS   AGE
tester   1/1     Running   0          56s

kubectl <span class="hljs-built_in">exec</span> tester --context kind-gcp-cluster -- sh -c <span class="hljs-string">"for i in \$(seq 1 5); do curl -s backend | grep 'Hostname'; done"</span>
Hostname: aws-backend-5c69d5f85-vxwx9
Hostname: aws-backend-5c69d5f85-7f68c
Hostname: aws-backend-5c69d5f85-vxwx9
Hostname: aws-backend-5c69d5f85-7f68c
Hostname: aws-backend-5c69d5f85-vxwx9

<span class="hljs-comment"># Once again for good measure!</span>
kubectl <span class="hljs-built_in">exec</span> tester --context kind-gcp-cluster -- sh -c <span class="hljs-string">"for i in \$(seq 1 5); do curl -s backend | grep 'Hostname'; done"</span>
Hostname: aws-backend-5c69d5f85-7f68c
Hostname: aws-backend-5c69d5f85-vxwx9
Hostname: aws-backend-5c69d5f85-7f68c
Hostname: aws-backend-5c69d5f85-vxwx9
Hostname: aws-backend-5c69d5f85-7f68c
</code></pre>
<p>Then bring it back up the gcp-backend pods and back in action.</p>
<pre><code class="lang-bash">kubectl --context kind-gcp-cluster scale deployment gcp-backend --replicas 2
deployment.apps/gcp-backend scaled

kubectl --context kind-gcp-cluster get pods
NAME                           READY   STATUS    RESTARTS   AGE
gcp-backend-57d9f789dd-49x4p   1/1     Running   0          4s
gcp-backend-57d9f789dd-9dhvg   1/1     Running   0          4s
tester                         1/1     Running   0          2m46s

kubectl <span class="hljs-built_in">exec</span> tester --context kind-gcp-cluster -- sh -c <span class="hljs-string">"for i in \$(seq 1 5); do curl -s backend | grep 'Hostname'; done"</span>
Hostname: gcp-backend-57d9f789dd-49x4p
Hostname: gcp-backend-57d9f789dd-49x4p
Hostname: gcp-backend-57d9f789dd-9dhvg
Hostname: gcp-backend-57d9f789dd-49x4p
Hostname: gcp-backend-57d9f789dd-49x4p
</code></pre>
<p>We’ve just configured and tested Kubernetes cluster failover!</p>
<h2 id="heading-observability-with-hubble">Observability with Hubble</h2>
<p>To see what’s going on with Cilium, we can use Hubble.</p>
<p>You can <a target="_blank" href="https://docs.cilium.io/en/stable/observability/hubble/setup/">enable Hubble</a> with the Cilium CLI or with Helm.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770155806216/485d1246-68e5-42f9-b3cf-e86111e17804.png" alt class="image--center mx-auto" /></p>
<p>Running an aws-tester pod and running a curl to its own aws-backend pods, seeing as it’s set to prefer local, and I have scaled down the gcp-backend pods. Give us more information on the traffic and requests:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770155777553/95146ec7-b5d0-4de3-b1d2-a3daa2556613.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-summary">Summary</h2>
<p>So there you have it! Cilium is much more than an updated network policy controller or Kubernetes firewall configuration.</p>
<p>We installed Cilium on 2 Kind clusters and enabled ClusterMesh!</p>
<h2 id="heading-bonus-round">Bonus Round!</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770155316990/adb6c38b-c671-4bbc-bad1-4207cade073e.png" alt class="image--center mx-auto" /></p>
<p><em>Enjoy this fine piece of AI generated bonus stage image. I was losing the will trying to prompt it to be more like the Street Fighter II bonus stage…</em></p>
<p>As a “Thank You” for reading this far (if indeed, you still are….) Here’s some b-b-bonus material!</p>
<p>We’re going to see the real advantage of Cilium network policy in action with ClusterMesh!</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">"cilium.io/v2"</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">"simple-path-blocker"</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">run:</span> <span class="hljs-string">tester</span>
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">toEndpoints:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">matchLabels:</span>
            <span class="hljs-attr">"k8s:io.kubernetes.pod.namespace":</span> <span class="hljs-string">kube-system</span>
            <span class="hljs-attr">k8s-app:</span> <span class="hljs-string">kube-dns</span>
      <span class="hljs-attr">toPorts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"53"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">ANY</span>
          <span class="hljs-attr">rules:</span>
            <span class="hljs-attr">dns:</span>
              <span class="hljs-bullet">-</span> <span class="hljs-attr">matchPattern:</span> <span class="hljs-string">"*"</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">toEndpoints:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">matchLabels:</span>
            <span class="hljs-attr">app:</span> <span class="hljs-string">backend</span>
      <span class="hljs-attr">toPorts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"80"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"8080"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
          <span class="hljs-attr">rules:</span>
            <span class="hljs-attr">http:</span>
              <span class="hljs-bullet">-</span> <span class="hljs-attr">method:</span> <span class="hljs-string">"GET"</span>
                <span class="hljs-attr">path:</span> <span class="hljs-string">"/public.*"</span>
</code></pre>
<p>curling the backend pods from the tester pod (This will work if the tester pod is running in either cluster, as both backend pod deployments are labelled <code>app: backend</code>):</p>
<pre><code class="lang-bash">kubectl <span class="hljs-built_in">exec</span> tester --context kind-gcp-cluster --   curl -v http://backend/admin

  % Total    % Received % Xferd  Average Speed  Time    Time    Time   Current
                                 Dload  Upload  Total   Spent   Left   Speed
  0      0   0      0   0      0      0      0                              0* Host backend:80 was resolved.
* IPv6: (none)
* IPv4: 10.11.57.143
*   Trying 10.11.57.143:80...
* Established connection to backend (10.11.57.143 port 80) from 10.10.1.123 port 50356 
* using HTTP/1.x
Access denied
&gt; GET /admin HTTP/1.1
&gt; Host: backend
&gt; User-Agent: curl/8.18.0
&gt; Accept: */*
&gt; 
* Request completely sent off
&lt; HTTP/1.1 403 Forbidden
&lt; content-length: 15
&lt; content-type: text/plain
&lt; date: Mon, 02 Feb 2026 21:33:07 GMT
&lt; server: envoy
&lt; 
{ [15 bytes data]
100     15 100     15   0      0   1479      0                              0
* Connection <span class="hljs-comment">#0 to host backend:80 left intact</span>

kubectl <span class="hljs-built_in">exec</span> tester --context kind-gcp-cluster --   curl -v http://backend/public

  % Total    % Received % Xferd  Average Speed  Time    Time    Time   Current
                                 Dload  Upload  Total   Spent   Left   Speed
  0      0   0      0   0      0      0      0                              0* Host backend:80 was resolved.
* IPv6: (none)
* IPv4: 10.11.57.143
*   Trying 10.11.57.143:80...


Hostname: gcp-backend-57d9f789dd-4jcng

Pod Information:
    -no pod information available-

Server values:
    server_version=nginx: 1.13.3 - lua: 10008

Request Information:
    client_address=10.10.1.134
    method=GET
    real path=/public
    query=
    request_version=1.1
    request_scheme=http
    request_uri=http://backend:8080/public

Request Headers:
    accept=*/*
    host=backend
    user-agent=curl/8.18.0
    x-envoy-expected-rq-timeout-ms=3600000
    x-envoy-internal=<span class="hljs-literal">true</span>
    x-forwarded-proto=http
    x-request-id=f42e5b24-41aa-4da5-a755-023ddcfdb7d9

Request Body:
    -no body <span class="hljs-keyword">in</span> request-

* Established connection to backend (10.11.57.143 port 80) from 10.10.1.123 port 39048 
* using HTTP/1.x
&gt; GET /public HTTP/1.1
&gt; Host: backend
&gt; User-Agent: curl/8.18.0
&gt; Accept: */*
&gt; 
* Request completely sent off
&lt; HTTP/1.1 200 OK
&lt; date: Mon, 02 Feb 2026 21:33:13 GMT
&lt; content-type: text/plain
&lt; server: envoy
&lt; x-envoy-upstream-service-time: 1
&lt; transfer-encoding: chunked
&lt; 
{ [577 bytes data]
100    565   0    565   0      0  18802      0                              0
* Connection <span class="hljs-comment">#0 to host backend:80 left intact</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770155848680/34986c1e-7e58-4084-aa7e-f7876bff84d1.png" alt class="image--center mx-auto" /></p>
<p>So to summarise what happened here:</p>
<ul>
<li><p><strong>Evidence of the Proxy:</strong> In both outputs, from curling the backend <code>/admin</code> and <code>/public</code> We can see <code>&lt; server: envoy</code>. This proves that traffic is no longer just "flowing" through the network; it is being actively intercepted and inspected by the Cilium-managed Envoy proxy.</p>
</li>
<li><p><strong>The "Secret" Headers:</strong> Notice the <code>x-envoy-internal: true</code> and <code>x-request-id</code> headers in the <code>/public</code> output. These are injected by the proxy and are great visual aids to show that "the network is now intelligent." Or more intelligent at least….</p>
</li>
<li><p><strong>The L7 Block:</strong> In the <code>/admin</code> request, we can see <code>Access denied</code> followed by the <code>403 Forbidden</code>. Because the "server" is still <code>envoy</code> It proves the block happened at the <strong>Network Policy level</strong>, not because the application itself rejected you.</p>
</li>
</ul>
<h2 id="heading-how-cilium-l7-network-policies-work">How Cilium L7 network policies work</h2>
<p>With the typical regular Kube network policy, it’s quite broad; it’s either ports 80 and 8080 that are open to the labels we configure. With the Cilium network policy, we can define those ports that are available, but we can also configure a particular operation to a URL path. <code>/public</code> and the implied deny takes care of requests to <code>/admin</code>.</p>
<p>Cilium uses eBPF to intercept the packet at the virtual Ethernet interface. Cilium then identifies the source as <code>run: tester</code> and the destination as <code>app: backend</code> based on security identities rather than just IP addresses.</p>
<p>Cilium’s eBPF sees that an L7 HTTP policy is configured and active for this destination. Instead of forwarding the packet to the network, eBPF redirects the traffic to a local Envoy proxy listener on the node. This is transparent Injection, meaning we don’t have to make any changes to our client applications.</p>
<p>Envoy terminates the TCP connection, parses the HTTP headers and checks the requested path e.g. <code>/public</code> and <code>/admin</code> as we tried, against the configured and active <code>CiliumNetworkPolicy</code>.</p>
<p>What we then see is Success with <code>/public</code>. Envoy sees <code>/public</code> is whitelisted, initiates a new connection to the backend pod on port 8080 and injects those "secret" headers like <code>x-request-id</code>.</p>
<p>And we see a Block on <code>/admin</code> Envoy sees <code>/admin</code> is NOT whitelisted. It immediately generates an HTTP 403 Forbidden response with the <code>server: envoy</code> header and sends it back to the <code>tester</code> pod.</p>
<p>By using eBPF to transparently redirect traffic to Envoy, Cilium gives us 'Service Mesh' capabilities, like fine-grained HTTP control—without the overhead of managing sidecar containers in every pod.</p>
<h2 id="heading-i-promise-this-is-actually-the-end-now">I promise this is actually the end now…</h2>
<p>Again, thank you for reading and following along, if indeed, you still are!</p>
<p>I’d been meaning to do a deeper dive into Cilum and find trying something as interesting as a multi cluster mesh more interesting than “Pod 1 can or can’t talk to pods 2 and 3….“ And writing about topics like this really motivates me to learn this fundamentally (What’s more motivating than not looking publicly stupid).</p>
<p>That being said, I am open to any comments, questions or a heads-up if I have missed anything, gotten the wrong end of the stick on something, something doesn’t work or what I should look at next!</p>
<h3 id="heading-disclaimer-can-you-tell-i-was-a-consultant"><em>Disclaimer (Can you tell I was a consultant!)</em></h3>
<p>I’ve tried to be as accurate and factual as I can with Cilium and Kubernetes, etc. I am learning this and writing guides, hows to’s etc helps me grasp the “why” on new topics!</p>
<p>To be transparent, I used AI to help me troubleshoot and peer review some parts. I have fact checked and actually tested all this code and config.</p>
<p>This is by no means production ready (It’s literally Kind on a micro PC!) Your mileage may vary! Don’t go exposing your server, workloads etc to the internet.</p>
<h2 id="heading-links-to-helpful-documentation">Links to helpful documentation</h2>
<p><a target="_blank" href="https://kind.sigs.k8s.io/docs/user/quick-start/">Kind quickstart</a></p>
<p><a target="_blank" href="https://kind.sigs.k8s.io/docs/user/known-issues/">Kind troubleshooting and common errors</a></p>
<p><a target="_blank" href="https://docs.cilium.io/en/stable/gettingstarted/k8s-install-default/#k8s-install-quick">Cilium quick installation</a></p>
<p><a target="_blank" href="https://docs.cilium.io/en/stable/installation/k8s-install-helm/">Cilium Helm installation</a></p>
<p><a target="_blank" href="https://docs.cilium.io/en/stable/network/clustermesh/clustermesh/#clustermesh">Cilium clustermesh setup</a></p>
<p><a target="_blank" href="https://docs.cilium.io/en/stable/gettingstarted/demo/#getting-started-with-the-star-wars-demo">Cilium Getting Started with the Star Wars Demo</a> (A great demo for an introduction to Cilium network policies)</p>
<p><a target="_blank" href="https://docs.cilium.io/en/stable/observability/hubble/setup/#hubble-setup">Hubble set up</a></p>
<p><a target="_blank" href="https://ebpf.io/what-is-ebpf/">What is eBPF</a></p>
]]></content:encoded></item><item><title><![CDATA[Deeper dive into Argo CD]]></title><description><![CDATA[You may have read my post from earlier this year “Flux CD Vs Argo CD“, where I took a comparison look at both.
I like both; both do a great job at reconciliation of your Kubernetes deployments and both can be at the core of your GitOps journey (maybe...]]></description><link>https://ferrishall.dev/deeper-dive-into-argo-cd</link><guid isPermaLink="true">https://ferrishall.dev/deeper-dive-into-argo-cd</guid><category><![CDATA[ArgoCD]]></category><category><![CDATA[GitHub]]></category><category><![CDATA[gitops]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Cloud Computing]]></category><category><![CDATA[cloud native]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Mon, 20 Oct 2025 19:49:59 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1760982005414/5b463717-0346-402e-a72b-16df7e5690ce.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You may have read my post from earlier this year “<a target="_blank" href="https://ferrishall.dev/flux-cd-vs-argo-cd">Flux CD Vs Argo CD</a>“, where I took a comparison look at both.</p>
<p>I like both; both do a great job at reconciliation of your Kubernetes deployments and both can be at the core of your GitOps journey (maybe not both together, but you get what I mean….pick one).</p>
<p>I'll be diving deeper into both over the next few months, starting now with this deep dive into Argo CD.</p>
<p>Currently, I have a Kubernetes cluster running on VMs running on a Proxmox server I use for my homelab.</p>
<p>I am running Argo CD on it as my deployment platform of choice. So I’ll run through some more of using Argo CD in a bit more depth.</p>
<h2 id="heading-authentication-to-a-private-git-repository">Authentication to a private Git repository</h2>
<p>First, we won’t always be deploying from public facing Github repos, nor will we want to have to expose our company or personal repos to the rest of the world, so let’s take a look at deploying from private GitHub repos.</p>
<p>I’ll be using the official <a target="_blank" href="https://argo-cd.readthedocs.io/en/latest/user-guide/private-repositories/">Argo CD docs</a> to help get set up with authenticating to my private GitHub repo.</p>
<p>First, I’ll set up a quick and easy application with some Kubernetes manifest files, just a basic 2-tier frontend and backend deployments and some services., importantly, in a private repo. Just to test Argo CD deploying from a private repo.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760328834841/96f1de55-d1c1-44c0-9c0e-977010cc9984.png" alt class="image--center mx-auto" /></p>
<p>There are a few methods we can use to authenticate to GitHub from Argo: Username and password/access token, TLS, SSH key and <a target="_blank" href="https://docs.github.com/en/apps/creating-github-apps/about-creating-github-apps/about-creating-github-apps#about-github-apps">GitHub app credentials</a>.</p>
<p>I’m not going to go through all of them right now, we’ll just settle on just creating an access token, as this is the simplest way to get started in my opinion, the GitHub app credential is a bit involved and more suited to something in production, so a simple single access token for the repo will be fine for developing for now.</p>
<p>Select ‘developer settings‘ from your GitHub settings page, then personal access tokens, under that “Fine-grained“ tokens.</p>
<p>Give your token a name, description, make sure “resource owner” is set to the owner of the repo, set your expiration, select your repos that Argo will have access to and the permissions. I’ve set mine to “contents” read-only. Then generate your token.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760331252562/9a6dd3de-1012-45d7-a1cb-532d3c60202f.png" alt class="image--center mx-auto" /></p>
<p><strong>Security Disclaimer!</strong></p>
<p>This should go without saying….DO NOT share this token, upload this token to Git or anywhere public.</p>
<p>This GitHub personal access token is the key to your personal and private repo. Even if your permissions are read-only, that can change and whoever has your token will have a key to your repo.</p>
<p>If you suspect your key has been compromised or uploaded to anything public, like accidentally pushed to a public repo (it happens, just don’t let it happen to you!), revoke it straight away from the same settings page. PSA disclaimer done!</p>
<p>Create the repository resource in Argo CD using declarative or using the admin interface. I’m using declarative YAML to apply to the cluster here.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Secret</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">demo-repo</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">argocd</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">argocd.argoproj.io/secret-type:</span> <span class="hljs-string">repository</span>
<span class="hljs-attr">stringData:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">git</span>
  <span class="hljs-attr">url:</span> <span class="hljs-string">https://github.com/my-github/my-argoc-demo-repo</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Secret</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">private-repo-creds</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">argocd</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">argocd.argoproj.io/secret-type:</span> <span class="hljs-string">repo-creds</span>
<span class="hljs-attr">stringData:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">git</span>
  <span class="hljs-attr">url:</span> <span class="hljs-string">https://github.com/my-github/</span>
  <span class="hljs-attr">password:</span> <span class="hljs-string">github_pat_12345_key_here_do_not_upload_to_git!!!!</span>
  <span class="hljs-attr">username:</span> <span class="hljs-string">your_username</span>
</code></pre>
<p>The first secret <code>demo-app</code> adds the <strong>specific repository</strong> we want Argo CD to track. The second secret is a template for how Argo CD can authenticate with my GitHub account using the Personal Access Token (<code>password</code> field).</p>
<p>Crucially<strong>,</strong> remember that the namespace for these Argo CD objects has to be <code>argocd</code> (or whatever namespace Argo CD is deployed to).</p>
<p>Once applied, check your Argo CD UI to confirm it's connected successfully.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760333841434/e3e90f08-a5ba-44be-8fde-8d887c2a68c9.png" alt class="image--center mx-auto" /></p>
<p><strong>Credential Templates</strong></p>
<p>That may have seemed like a fair effort to get just one repo auth’d with Argo, chances are you’ll be working with and deploying from multiple repos; that’s where <a target="_blank" href="https://argo-cd.readthedocs.io/en/latest/user-guide/private-repositories/#credential-templates">credential templates</a> come in. These set the general authentication method for the repos in your account that you want Argo to work with.</p>
<p><strong>App deployment</strong></p>
<p>Let’s try it out, add a new application in Argo CD using the admin console or declarative manifest YAML files. I’ll show you what the console looks like, as it is pretty helpful and nice to look at when it’s applying!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760334052951/ea6e0992-ef52-47bc-a2a6-3ce199bea076.png" alt class="image--center mx-auto" /></p>
<p>Give it a name, add it to the default project and tick any of the options you might want.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760334137007/fa7433e9-4daa-4f2b-bb40-65d1123618a9.png" alt class="image--center mx-auto" /></p>
<p>Add your source repo, point it at the correct branch, cluster (local cluster Argo CD is running on for me currently)and the target namespace.</p>
<p>In no time at all, the application has auto-synced as I configured and deployed my Kubernetes manifest YAML files.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760334302363/0797fdad-9f12-4790-848f-d350389b19c9.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760334349441/e8ca82cb-4bb0-432c-99fa-f57bd3c77691.png" alt class="image--center mx-auto" /></p>
<p>Now, when any changes are made to my private repo, for example, the number of pods I have for the backend, Argo CD will reconcile.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760334472828/40067818-f82b-4b01-a5fc-f49764246575.png" alt class="image--center mx-auto" /></p>
<p>5 mins later…</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760334728776/8303bccb-00e2-4c40-9aa3-377a8fd5fabb.png" alt class="image--center mx-auto" /></p>
<p>GitOps at its finest from a private GitHub repo!</p>
<p><strong>Psst!</strong></p>
<p>You probably don’t want to be creating all your Argo CD applications with the GUI console, so here’s some declarative YAML manifest to do the same thing.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">argoproj.io/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Application</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">demo-private-k8s</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">argocd</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">destination:</span>
    <span class="hljs-attr">namespace:</span> <span class="hljs-string">argo-demo-app</span> <span class="hljs-comment">## Dont forget to add your target namspace!</span>
    <span class="hljs-attr">server:</span> <span class="hljs-string">https://kubernetes.default.svc</span>
  <span class="hljs-attr">project:</span> <span class="hljs-string">default</span>
  <span class="hljs-attr">source:</span>
    <span class="hljs-attr">path:</span> <span class="hljs-string">.</span>
    <span class="hljs-attr">repoURL:</span> <span class="hljs-string">https://github.com/my-github/your_repo_url_here</span>
    <span class="hljs-attr">targetRevision:</span> <span class="hljs-string">HEAD</span>
  <span class="hljs-attr">syncPolicy:</span>
    <span class="hljs-attr">automated:</span>
      <span class="hljs-attr">prune:</span> <span class="hljs-literal">true</span>
      <span class="hljs-attr">selfHeal:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">syncOptions:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">CreateNamespace=true</span>
</code></pre>
<h2 id="heading-app-of-apps">App of apps</h2>
<p>So we’re now applying our applications to Argo CD, but might be thinking “But I’m still manually creating the Argo CD application?“.</p>
<p>Feels like we’re kicking the automation can down the road? This is where the <a target="_blank" href="https://argo-cd.readthedocs.io/en/latest/operator-manual/cluster-bootstrapping/#app-of-apps-pattern">App of Apps approach</a> comes in to help us enable the automation of creating our applications.</p>
<p>Wouldn’t it be nice to bootstrap those clusters with the core or common deployments to get them up and running? RBAC? Namespaces? Istio or Prometheus, let’s say, for example.</p>
<p>You can deploy an app in Argo CD that consists of all your …. Apps! <a target="_blank" href="https://argo-cd.readthedocs.io/en/latest/operator-manual/cluster-bootstrapping/">The app of apps approach.</a></p>
<p>I've created a new private repository called <code>argocd-app-of-apps</code> with an <code>apps</code> directory inside it. This <code>apps</code> directory will hold the YAML manifests for all my sub-applications.</p>
<p>I will also need to configure the Argo CD to deploy from this new GitHub repo, like we did before with the demo-private-k8s repo. You can do this via the console or just add it to the repo manifest file and apply again.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">argoproj.io/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Application</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">app-of-apps</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">argocd</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">destination:</span>
    <span class="hljs-attr">namespace:</span> <span class="hljs-string">argocd</span>
    <span class="hljs-attr">server:</span> <span class="hljs-string">https://kubernetes.default.svc</span>
  <span class="hljs-attr">project:</span> <span class="hljs-string">default</span>
  <span class="hljs-attr">source:</span>
    <span class="hljs-attr">path:</span> <span class="hljs-string">apps</span>
    <span class="hljs-attr">repoURL:</span> <span class="hljs-string">https://github.com/my-github/argocd-app-of-apps</span>
    <span class="hljs-attr">targetRevision:</span> <span class="hljs-string">HEAD</span>
  <span class="hljs-attr">syncPolicy:</span>
    <span class="hljs-attr">automated:</span>
      <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
      <span class="hljs-attr">prune:</span> <span class="hljs-literal">true</span>
      <span class="hljs-attr">selfHeal:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">syncOptions:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">CreateNamespace=true</span>
</code></pre>
<p>And in that repo is an <code>apps</code> directory where the application YAML manifests, which are referred to in <code>path</code>.</p>
<p>I’ve been playing with Bitnami’s Sealed Secrets recently, so I want that on my cluster and potentially any other clusters I have too, so I’ll add a <code>sealed-secrets.yaml</code> to the apps directory.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">argoproj.io/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Application</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">sealed-secrets</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">argocd</span>
  <span class="hljs-attr">finalizers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">resources-finalizer.argocd.argoproj.io/foreground</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">project:</span> <span class="hljs-string">default</span>
  <span class="hljs-attr">source:</span>
    <span class="hljs-attr">repoURL:</span> <span class="hljs-string">https://bitnami-labs.github.io/sealed-secrets</span>
    <span class="hljs-attr">chart:</span> <span class="hljs-string">sealed-secrets</span>
    <span class="hljs-attr">targetRevision:</span> <span class="hljs-string">'2.17.7'</span>
    <span class="hljs-attr">helm:</span>
      <span class="hljs-attr">parameters:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">installCRDs</span>
          <span class="hljs-attr">value:</span> <span class="hljs-string">'true'</span>
  <span class="hljs-attr">destination:</span>
    <span class="hljs-attr">namespace:</span> <span class="hljs-string">kube-system</span>
    <span class="hljs-attr">server:</span> <span class="hljs-string">https://kubernetes.default.svc</span>
  <span class="hljs-attr">syncPolicy:</span>
    <span class="hljs-attr">automated:</span> {}
    <span class="hljs-attr">syncOptions:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">CreateNamespace=true</span>
</code></pre>
<p>I’ll add the Argo CD guestbook example app there too, just for demonstration.</p>
<p>Push to the new app-of-apps repo and now apply the <code>app-of-apps.yaml</code> manifest file to the cluster; this should be a one-time thing that a cluster admin would do.</p>
<p><code>kubectl apply -f app-of-apps.yaml</code></p>
<p>Now the argocd-app-of-apps is created in Argo CD and that itself is now deploying the YAML manifests from the apps directory in the GitHub repo</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760560857085/3c42d3d3-ff87-4d63-b240-9bbf134b170e.png" alt class="image--center mx-auto" /></p>
<p>Guestbook and sealed-secrets are now deployed onto the cluster. Let’s go one further and add another; after all, we don’t want to be adding new repos manually or imperatively.</p>
<p>I’ll add a demo-private-repo.yaml file to the apps directory:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">argoproj.io/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Application</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">demo-private-k8s</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">argocd</span>
  <span class="hljs-attr">finalizers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">resources-finalizer.argocd.argoproj.io/foreground</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">project:</span> <span class="hljs-string">default</span>
  <span class="hljs-attr">source:</span>
    <span class="hljs-attr">repoURL:</span> <span class="hljs-string">https://github.com/my-github/argo-private-application</span>
    <span class="hljs-attr">targetRevision:</span> <span class="hljs-string">HEAD</span>
    <span class="hljs-attr">path:</span> <span class="hljs-string">.</span>
  <span class="hljs-attr">destination:</span>
    <span class="hljs-attr">server:</span> <span class="hljs-string">https://kubernetes.default.svc</span>
    <span class="hljs-attr">namespace:</span> <span class="hljs-string">argo-demo-app</span>
  <span class="hljs-attr">syncPolicy:</span>
    <span class="hljs-attr">automated:</span>
      <span class="hljs-attr">prune:</span> <span class="hljs-literal">true</span>
      <span class="hljs-attr">selfHeal:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">syncOptions:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">CreateNamespace=true</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760561677294/97243823-52f4-4da1-821e-54ccd7815e4c.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760561443947/703ad6e7-8dff-48e4-b590-c99db61df7ee.png" alt class="image--center mx-auto" /></p>
<p>And there we have applications we can use to bootstrap and deploy to clusters in a GitOps way.</p>
<p>Hopefully, these are some helpful ideas and ways of getting the most out of your Argo CD deployments.</p>
<h2 id="heading-multi-cluster-deployment">Multi-cluster deployment</h2>
<p>With all the apps we’re deploying, it’s doubtful you’ll only be deploying to a single cluster.</p>
<p>Chances are, you might want to test on a cluster or have a pre-prod or staging cluster? Argo CD also lets you deploy to other clusters.</p>
<p>To <a target="_blank" href="https://argo-cd.readthedocs.io/en/latest/operator-manual/cluster-management/">add a cluster</a> to Argo CD using the <code>argocd</code> CLI, I grabbed the kube config file from my 2nd cluster, test-cluster and ran this: <code>argocd cluster add test-cluster --kubeconfig=.kube/test-config</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760612509050/76c9b6d0-04b6-4dca-bd99-c7cbe5722122.png" alt class="image--center mx-auto" /></p>
<p>You can do this declaratively using Kubernetes <a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#clusters">secrets</a>. You can run <code>kubectl -n argocd get secrets</code> and see your cluster as a secret that Argo CD uses to deploy to.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760613212049/5938e505-2a29-4ffb-9d28-34fa5e7b7846.png" alt class="image--center mx-auto" /></p>
<p>A quick test, I’ll deploy the demo-private-k8s application to the new test-cluster. I’ve done this via the console and on the cluster drop-down pointed it to my test-cluster.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760613352811/f8b564e4-17fc-40d1-a3ef-8454c896b93c.png" alt class="image--center mx-auto" /></p>
<p>Let’s check the cluster for the new pods:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760613419210/98d86417-d5a5-4504-b360-6845b6d3c5da.png" alt class="image--center mx-auto" /></p>
<p>Here’s the Argo application YAML manifest, you’ll need to be careful and make sure it is named differently from the already deployed application; otherwise, it will overwrite the existing demo-private-k8s application (that absolutely happened to me and I took me too long to realise…..)</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">argoproj.io/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Application</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">test-demo-private-k8s</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">argocd</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">destination:</span>
    <span class="hljs-attr">namespace:</span> <span class="hljs-string">argo-demo-app</span>
    <span class="hljs-attr">server:</span> <span class="hljs-string">https://192.168.1.106:6443</span>
  <span class="hljs-attr">project:</span> <span class="hljs-string">default</span>
  <span class="hljs-attr">source:</span>
    <span class="hljs-attr">path:</span> <span class="hljs-string">.</span>
    <span class="hljs-attr">repoURL:</span> <span class="hljs-string">https://github.com/my-github/argo-private-application</span>
    <span class="hljs-attr">targetRevision:</span> <span class="hljs-string">HEAD</span>
  <span class="hljs-attr">syncPolicy:</span>
    <span class="hljs-attr">automated:</span>
      <span class="hljs-attr">prune:</span> <span class="hljs-literal">true</span>
      <span class="hljs-attr">selfHeal:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">syncOptions:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">CreateNamespace=true</span>
</code></pre>
<p>You could apply this with your app-of-apps repo; it would be a fair bit of copy paste if you wanted to have your manifest files deployed to 2,3, etc clusters.</p>
<p>I’ve seen examples of using Kustomize, which helps template Kubernetes deployments so you can change variables for deployments to multiple clusters in multiple environments using the same core deployment manifests code.</p>
<p>It uses base code and overlays for specifying specific variables like labels, replicas etc would be different per cluster and/or environment.</p>
<p>I won’t get into it now, it’s something I’m learning about and this looks to be a great use case for Kustomize, maybe I’ll do a write up on Kustomize?!</p>
<h2 id="heading-best-practices-and-security">Best practices and security</h2>
<p>We’ve covered some really interesting concepts to advance the deployment of our Kubernetes deployments using Argo CD.</p>
<p>This section includes some best practices and some security thoughts and considerations.</p>
<h3 id="heading-projects-principle-of-least-privilege">Projects: Principle of Least Privilege</h3>
<p>You may have noticed the <strong>Default project</strong> when deploying. Argo CD <strong>Projects</strong> are logical groupings that create clear trust and security boundaries.</p>
<p>Not all applications should be available to everyone. By creating projects for different teams or environments (<code>staging</code>, <code>production</code>), you can restrict:</p>
<ul>
<li><p>Which <strong>source repos</strong> can be deployed.</p>
</li>
<li><p>Which <strong>destination clusters/namespaces</strong> can be deployed to.</p>
</li>
<li><p>What <strong>Kubernetes resource kinds</strong> can be created (e.g., deny the creation of ClusterRoles).</p>
</li>
</ul>
<p>This is key to implementing the <strong>Principle of Least Privilege</strong>.</p>
<p>Here is an example <code>AppProject</code> manifest:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">argoproj.io/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">AppProject</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">exmaple-project</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">argocd</span>
  <span class="hljs-comment"># Finalizer that ensures that project is not deleted until it is not referenced by any application</span>
  <span class="hljs-attr">finalizers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">resources-finalizer.argocd.argoproj.io</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-comment"># Project description</span>
  <span class="hljs-attr">description:</span> <span class="hljs-string">Example</span> <span class="hljs-string">Project</span>

  <span class="hljs-comment"># Allow manifests to deploy from any Git repos</span>
  <span class="hljs-attr">sourceRepos:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">'https://github.com/my-github/argo-private-application'</span>

  <span class="hljs-comment"># Only permit applications to deploy to the 'guestbook' namespace or any namespace starting with 'guestbook-' in the same cluster</span>
  <span class="hljs-comment"># Destination clusters can be identified by 'server', 'name', or both.</span>
  <span class="hljs-attr">destinations:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">namespace:</span> <span class="hljs-string">testing-projects-scope</span>
    <span class="hljs-attr">server:</span> <span class="hljs-string">https://192.168.1.106:6443</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">test-cluster</span>

  <span class="hljs-comment"># Deny all cluster-scoped resources from being created, except for Namespace</span>
  <span class="hljs-attr">clusterResourceWhitelist:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">group:</span> <span class="hljs-string">''</span>
    <span class="hljs-attr">kind:</span> <span class="hljs-string">Namespace</span>
</code></pre>
<p>Let’s try an application deployment. I’ll try something to the correct cluster and from the expected Git repo, restricted to the <code>testing-project-scope</code> I configured in the project manifest above:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760954182912/77570e70-53de-4f64-89d7-4ea9e9fdec10.png" alt class="image--center mx-auto" /></p>
<p>Now I’ll try deploying the Argo Guestbook application to that same cluster and the correct namespace using the Argo CD <code>example-project</code> project.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760955378598/921e606e-e9b8-4641-a8bb-6d424a684754.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760987498413/7d8103d4-dda5-4e9d-b8b5-bafab56b8f85.png" alt class="image--center mx-auto" /></p>
<p>The application was not deployed as the example-project is not permitted to deploy from the Argo GitHub repo, as configured in the <code>example-project</code> I created.</p>
<p>To wrap this up, here are a few other essential security and advanced topics to consider as you deepen your Argo CD usage:</p>
<ul>
<li><p><a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/operator-manual/rbac/"><strong>Argo CD RBAC</strong></a><strong>:</strong> Add users/teams to your projects, limiting who can perform actions like syncing, deleting, or rolling back applications.</p>
</li>
<li><p><strong>Secret Management:</strong> Implement a solution like the <a target="_blank" href="https://github.com/bitnami-labs/sealed-secrets?tab=readme-ov-file#related-projects"><strong>Sealed Secrets</strong></a> I hinted at earlier, or use <strong>HashiCorp Vault</strong> to securely store the sensitive information (like the GitHub tokens) that Argo CD needs.</p>
</li>
<li><p><a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/user-guide/sync-waves/"><strong>Sync Hooks</strong></a><strong>:</strong> Look into Argo CD's sync phases and hooks to define custom actions (e.g., running database migrations) that must occur <em>before</em> or <em>after</em> the main deployment is applied.</p>
</li>
</ul>
<h2 id="heading-final-thoughts-and-next-steps">Final Thoughts and Next Steps</h2>
<p>We've successfully moved from a basic Argo CD install to a scalable, automated, and secure deployment platform capable of:</p>
<ol>
<li><p>Handling <strong>Private Repositories</strong> securely.</p>
</li>
<li><p><strong>Bootstrapping</strong> new clusters using the <strong>App of Apps</strong> pattern.</p>
</li>
<li><p>Deploying to <strong>Multiple Clusters</strong>.</p>
</li>
<li><p>Enforcing organisational and security policies using <strong>Projects</strong>.</p>
</li>
</ol>
<p>Thanks for sticking with me, this was a bit of a long one! I was aiming for something people could follow along with, so I hope you enjoyed this journey!</p>
<p>Leave a comment if you think I missed something, got it wrong, or have anything to add or I should check out!</p>
]]></content:encoded></item><item><title><![CDATA[Kubernetes AI Agent - Kagent]]></title><description><![CDATA[*Clickbait warning, of course I will, but it’s cool you don’t have to!
Sorry, that’s quite misleading! However, I’ve been experimenting with Kagent on my local Kubernetes development cluster, and it’s really neat!
I could have done with something lik...]]></description><link>https://ferrishall.dev/kubernetes-ai-agent-kagent</link><guid isPermaLink="true">https://ferrishall.dev/kubernetes-ai-agent-kagent</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[AI]]></category><category><![CDATA[ai-agent]]></category><category><![CDATA[kagent]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Wed, 27 Aug 2025 19:56:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1756325006847/4dfbc518-ea42-4464-80f8-81e70501ff4c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-clickbait-warning-of-course-i-will-but-its-cool-you-dont-have-to">*Clickbait warning, of course I will, but it’s cool you don’t have to!</h2>
<p>Sorry, that’s quite misleading! However, I’ve been experimenting with Kagent on my local Kubernetes development cluster, and it’s really neat!</p>
<p>I could have done with something like this during my on-call SRE days!</p>
<h2 id="heading-kubernetes-without-the-kubectl">Kubernetes without the kubectl!</h2>
<p>My old colleague and fellow Kubernetes enthusiast Chris Matcham suitably inspired me to give this a try. Colour me intrigued!</p>
<p>I followed Chris’ <a target="_blank" href="https://chrismatcham.dev/Deploying-a-K8S-ninja-using-kagent-MCP-with-ArgoCD-&amp;-Helm/">guide here</a> now. I didn’t have any credit on OpenAI ChatGPT, so I wanted to try it using Google’s Gemini as my AI backend</p>
<h2 id="heading-not-a-quick-start-ish">Not a quick start-ish</h2>
<p>So I’m not going to go over installing Kagent, Chris’ post already does a great job of that. Use Helm, use ArgoCD, however you like.</p>
<p>As I’m planning on using Google Gemini as my AI backend I will add what I did differently to help get you started.</p>
<p>First, I manually added a Kubernetes Secret in my kagent-agent namespace. I need to add my API key as a secret. So you can head over to Google AI Studio to get yourself an API key and associate it with a Google Cloud Project. I had a personal Google Cloud project I was already using for some personal dev things so was ready to go.</p>
<p>Then create your secret.</p>
<p>I then opted to install using ArgoCD to deploy. I had it running anyway. I just used the UI for testing purposes, just make sure to install the Kagent CRDs first.</p>
<p>Head over to aistudio.google.com and hit create API key and associate it with a Google Cloud project you have access to, then add some values to create the Gemini provider as part of the Helm deployment:</p>
<p>We’ll set up ArgoCD like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750425381855/119852b7-699c-4361-b1af-234571ed34f8.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750425436754/f87164c1-3f7a-492b-a9bb-47ed83d3630f.png" alt class="image--center mx-auto" /></p>
<p>We’re telling Helm to include the Kubernetes secret I created manually that has the API key I just created.</p>
<p>To create the application in ArgoCD, I could have created a declarative yaml file (Chris does a great job of this in his blog post), but as I’m just tinkering with this, I’m opting for the UI.</p>
<p>The repo URL I’m using is <a target="_blank" href="http://ghcr.io/kagent-dev/kagent/helm">ghcr.io/kagent-dev/kagent/helm</a> and the charts for the CRDs are kagent-crds version: 0.3.15 and I’m using the same Helm repo URL and the chart name kagent for the agent deployment using version 0.3.15</p>
<p>Syncronise the ArgoCD application and I should have 2 running applications, 1 for the CRDs and another for the actual agent</p>
<p>I’m using Rancher Desktop for my local cluster, so with a quick port forward of the agent service resource and I can now access my Kagent deployment. Let’s head over to the models to make sure my changes to configure the model to Gemini worked:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750425614569/85d5bcc5-7c66-4cba-98a1-b2fb69b0732f.png" alt class="image--center mx-auto" /></p>
<p>Let’s open a chat and take it for a spin!</p>
<p>I created a deployment manually using the terminal, so let’s test if Kagent can see what’s running on the cluster:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750426043915/5242aa71-4085-4ef9-9c14-0ab7dee73d7d.png" alt class="image--center mx-auto" /></p>
<p>Nice! I can interact with the cluster without using kubectl!</p>
<p>Let’s try something else, let’s create a pod</p>
<pre><code class="lang-bash">kubectl run something-wrong --image ngiinx:latest
</code></pre>
<p>Let’s see if it spots it…</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750433331783/3d107d52-cd7e-4a21-913d-a2d5f758deeb.png" alt class="image--center mx-auto" /></p>
<p>It’s noticed that there is an incorrectly configured pod!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750433399947/f3f98d46-50b2-4617-b336-039dfc71ff29.png" alt class="image--center mx-auto" /></p>
<p>Before I can even ask, it’s fixed the incorrect image.</p>
<p>What is interesting is that I’ve had different outcomes from this type of issue I’ve asked to look into.</p>
<p>Sometimes it’s given me the “I’ve found you’ve configured the image Nginx which doesn’t look right, you should correct it“ to it correcting it, not telling me, then when I ask “Whats wrong with my pod?“ it starts going diving into Kubelet and networking possibilities because another pod is working so its probably not that type thing.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750438614829/f451b787-bcec-4f42-8d08-8122d67e99f2.jpeg" alt class="image--center mx-auto" /></p>
<p><em>In case no one’s noticed, I’m a big Simpsons and Futurama fan…</em></p>
<p>I have tried this by creating an agent that only has read access on the cluster, so it can’t make changes.</p>
<p>If you're implementing using GitOps, ArgoCD etc, then that might be for the best, and using Kagnet as a troubleshooting tool.</p>
<p>Imagine waking up at 3 am thanks to Pager duty and you’ve got an AI assistant to summarise logs and spot things that don’t look right? Game changer! What I would have given for this type of thing when I was an on-call SRE getting alerts at 2 am!</p>
<p>AI Agents are the new hot jam, I’m really keen to investigate and try this out a bit more as it’s quite new to me, potentially have something like this running for my Homelab to help troubleshoot etc.</p>
<p>My good pal Chris does a great job getting you started with his walkthrough <a target="_blank" href="https://chrismatcham.dev/Deploying-a-K8S-ninja-using-kagent-MCP-with-ArgoCD-&amp;-Helm/">here</a>. I just tried it out with a Google Cloud Gemini twist.</p>
<p>You can find out more about Kagent in more detail and its thriving community at their <a target="_blank" href="https://kagent.dev/docs/kagent/introduction/installation">website.</a></p>
]]></content:encoded></item><item><title><![CDATA[Passing the KCSA & CKS to become a Kubestronaut]]></title><description><![CDATA[I can’t figure out image sizes…
For those of you who have been reading my blog posts and following along, you might have guessed a bit of a theme... If you haven't guessed, I've been on a real learning journey to learn and get my hands on as much Kub...]]></description><link>https://ferrishall.dev/passing-the-kcsa-and-cks-to-become-a-kubestronaut</link><guid isPermaLink="true">https://ferrishall.dev/passing-the-kcsa-and-cks-to-become-a-kubestronaut</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[#kubernetes #container ]]></category><category><![CDATA[containers]]></category><category><![CDATA[kubestronaut]]></category><category><![CDATA[Security]]></category><category><![CDATA[Cloud Computing]]></category><category><![CDATA[cloud native]]></category><category><![CDATA[cloud security]]></category><category><![CDATA[development]]></category><category><![CDATA[learning]]></category><category><![CDATA[Certification]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Tue, 17 Jun 2025 17:24:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1750181718416/f90db834-a29b-4b24-828d-94a2ea72d2a3.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>I can’t figure out image sizes…</em></p>
<p>For those of you who have been reading my blog posts and following along, you might have guessed a bit of a theme... If you haven't guessed, I've been on a real learning journey to learn and get my hands on as much Kubernetes as I can.</p>
<p>This is a write-up of my approach to the final 2 exams for me, the KCSA and the CKS.</p>
<h2 id="heading-opening-thoughts-on-certifications-as-part-of-development">Opening thoughts on certifications as part of development</h2>
<p>As a goal driven individual, I learn and develop best when there is an end goal to reach for and aspire to (There is no real end to this but don’t tell my brain that…)</p>
<p>In my case, the <a target="_blank" href="https://www.cncf.io/training/kubestronaut/">Kubestronaut Certification</a>, one blue jacket to rule them all!</p>
<p>Which is why certifications matter to me. There is a bit of a debate with certs: “certifications don’t maketh the engineer”, which is right, they don’t, but they are a great way to validate skills and experience.</p>
<p>So I get both sides of the coin.</p>
<p>I’ve met and worked with some amazing DevOps, Data engineers, and developers who don’t have a cert to their name and never will; they don’t see the point in it.</p>
<p>I’ve also interviewed some people who had 5 Google Cloud certs but couldn’t explain a simple architecture from a design they clearly “liberated” from the internet and had clearly exam dumped to pass certs without the experience to back it up.</p>
<p>So, I get it.</p>
<p>For me, I want to learn the real fundamentals of a thing, so with Kubernetes Security, I refreshed my Docker skills and went back to basics, breaking and fixing clusters before diving into the security topics of the Certified Kubernetes Security Specialist.</p>
<p>Sorry for the long ramble of an intro, I was a cloud trainer for 2 and a half years, and it’s questions and thoughts I got asked about a lot, so I thought I’d add my 2 pence worth.</p>
<p>TL;DR certs are good, but not the be-all and end-all; experience and fundamental learning absolutely matter.</p>
<h2 id="heading-the-kubernetes-and-cloud-security-associate-kcsa">The Kubernetes and Cloud Security Associate (KCSA)</h2>
<p>While I was preparing and learning the technical security skills for the CKS, I'll add my thoughts after this. I wanted to get the <a target="_blank" href="https://www.cncf.io/training/certification/kcsa/">KCSA</a> out of the way, so here's my take and how I approached it.</p>
<p>It's a multiple-choice exam and focuses more on fundamental and overview understanding of cloud-native security with Kubernetes and container-based design at the forefront. Ergo, it's a nice introductory cert into Kubernetes Cloud Native Security.</p>
<p>This exam is an online, proctored, multiple-choice exam.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Domain</td><td>Weight</td></tr>
</thead>
<tbody>
<tr>
<td>Overview of Cloud Native Security</td><td>14%</td></tr>
<tr>
<td>Kubernetes Cluster Component Security</td><td>22%</td></tr>
<tr>
<td>Kubernetes Security Fundamentals</td><td>22%</td></tr>
<tr>
<td>Kubernetes Threat Model</td><td>16%</td></tr>
<tr>
<td>Platform Security</td><td>16%</td></tr>
<tr>
<td>Compliance and Security Frameworks</td><td>10%</td></tr>
</tbody>
</table>
</div><p>If you’ve been in the industry long enough and have some experience with a cloud-native culture and tools, chances are you probably have some fundamental knowledge already in the topic areas of Overview of Cloud Native Security, Platform Security, and Compliance and Security Frameworks.</p>
<h2 id="heading-compliance-and-threat-modelling">Compliance and Threat Modelling</h2>
<p>I did find myself brushing up on some of the compliance frameworks, as it had been a while since I had read about any of them. I read up on what they are and what they cover: NIST, CIS, GDPR, and PCI DSS. Google is your friend for that.</p>
<p>I also read up on some threat modelling frameworks, STRIDE and DREAD. These were also mentioned in the exam I took.</p>
<h2 id="heading-cluster-component-security">Cluster component security</h2>
<p>The slightly trickier areas I found I had to think about a bit more were Kubernetes Cluster Component Security, Kubernetes Security Fundamentals, and Kubernetes Threat Model.</p>
<p>The cluster component security and security fundamentals question areas focused on the components themselves, which are considered best practices that you apply to protect secrets around etcd, for example.</p>
<p>In my exam experience, there were definitely some questions I had to flag and return to, some real head-scratchers. It’s not an “easy” exam (there aren’t any), but it is an “easier” exam. I passed the first time with a decent score in the low 90s, I think. I wasn’t expecting a score that high!</p>
<p>Links I used for studying:</p>
<p><a target="_blank" href="https://learn.kodekloud.com/user/courses/kubernetes-and-cloud-native-security-associate-kcsa">Kode Kloud KCSA course</a></p>
<p><a target="_blank" href="https://cloudsecdocs.com/containers/theory/threats/k8s_threat_model/">Kubernetes Threat Modelling</a></p>
<p><a target="_blank" href="https://www.softwaresecured.com/post/comparison-of-stride-dread-pasta">Kubernetes STRIDE, DREAD &amp; PASTA threat models</a></p>
<h2 id="heading-now-for-the-main-event-certified-kubernetes-security-specialist-cks">Now for the main event! Certified Kubernetes Security Specialist (CKS)</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750164867823/507f9331-65f3-4ccc-af21-c4485c5fe962.png" alt class="image--center mx-auto" /></p>
<p><em>Chat GPT is getting better at drawing me pictures for my blogs</em></p>
<p>I had already passed the CKA and the CKAD. I wrote about <a target="_blank" href="https://ferrishall.dev/certified-kubernetes-administrator-preparation-and-certification">how I prepped for the CKA</a> and <a target="_blank" href="https://ferrishall.dev/how-to-set-up-a-kubernetes-cluster-for-studying-and-exam-preparation">creating a cluster to practice</a>.</p>
<p>In a previous life, I was a Linux SysAdmin, and I use Linux as part of my day job; it’s a fundamental skill. As I tell people who are just getting started, make sure they have grasped basic Linux skills, networking, and containers before getting into Kubernetes.</p>
<p>I found the jump felt smaller when I first started learning about Kubernetes years ago because I had the fundamental skills mentioned. Learning about the abstraction, context, and importantly, the “why” was much easier with those fundamental skill sets and experience in my toolbelt.</p>
<p>That certainly helped with the CKA and CKAD, but it is not a guaranteed pass for the CKS. This is a very tough exam, and I did not pass on my first try!</p>
<p>Which, at the time, stung, but it helped me focus on what I was good at and what I needed to improve on #FailureisnotFinal</p>
<h2 id="heading-cks-exam-overview">CKS Exam overview</h2>
<p>Some information about the exam:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Domain</td><td>Weight</td></tr>
</thead>
<tbody>
<tr>
<td>Cluster Setup</td><td>10%</td></tr>
<tr>
<td>Cluster Hardening</td><td>15%</td></tr>
<tr>
<td>System Hardening</td><td>15%</td></tr>
<tr>
<td>Minimize Microservice Vulnerabilities</td><td>20%</td></tr>
<tr>
<td>Supply Chain Security</td><td>20%</td></tr>
<tr>
<td>Monitoring, Logging, and Runtime Security</td><td>20%</td></tr>
</tbody>
</table>
</div><p>Make sure you check out the full information on the domains and competencies on the <a target="_blank" href="https://training.linuxfoundation.org/certification/certified-kubernetes-security-specialist/">official CKS exam page</a>.</p>
<p>It’s a performance based test that requires solving multiple tasks from a command line running Kubernetes. You have 2 hours to complete the tasks.</p>
<p>I’m not aiming to give any direct, exact questions I got in my exam, that’s not the point of this post and it’s pretty much NDA’d and frowned upon for me to disclose that.</p>
<h2 id="heading-my-focus-areas-yours-may-differ">My focus areas. Yours may differ….</h2>
<p>These are some areas that were new to me, needed a refresh and needed my extra focus:</p>
<ul>
<li><p>Falco (<a target="_blank" href="https://ferrishall.dev/getting-started-with-falco-security-tool-on-gke">I wrote about it here</a>) Know where to look for logs, how to update rules and find pods based on alerts firing.</p>
</li>
<li><p>A real cheeky one, I thought at the time. I had an Istio task which does appear under <strong>Minimize Microservice Vulnerabilities</strong> domain, I’m just really glad I did a deep dive (<a target="_blank" href="https://ferrishall.dev/istio-service-mesh-deepish-dive-architecture-traffic-control-security-and-observability">shameless plug of blog post here</a>)</p>
</li>
<li><p>Bom/SBom - Tools like <a target="_blank" href="https://github.com/aquasecurity/trivy">Trivy</a> and the <a target="_blank" href="https://github.com/kubernetes-sigs/bom">Bom utility by the Kubernetes project</a>. Learn how to use them, you will get tasks related to and using these tools, which covers the Supply Chain Security Domain part of the exam.</p>
</li>
<li><p>Remember updating a Cluster from your CKA? You should still remember how to do that…..</p>
</li>
<li><p>Know about etcd and how to make configuration changes to it regarding secrets and security best practices.</p>
</li>
<li><p>Get well practised at making changes to the kube-api-server (implementing logging, plugins and security changes) and practice getting fast and accurate at it. This helped me a lot during my re-take. I was too slow the first time around.</p>
</li>
<li><p>Know about <a target="_blank" href="https://apparmor.net/">AppArmor</a> and how to implement profiles on nodes and pods.</p>
</li>
<li><p>Know about <a target="_blank" href="https://kubernetes.io/docs/tutorials/security/seccomp/">Seccomp</a> and how it restricts which system calls are allowed.</p>
</li>
<li><p>Practice Network Policies. While you're at it, go practice Cilium Network policies too. <a target="_blank" href="https://editor.networkpolicy.io/">Network policy editor</a> is a great tool; try it out.</p>
</li>
<li><p>Speaking of speed, the 2 hours pass by incredibly quickly! In both attempts, I flagged 3 out of the 16 tasks that I didn't even try. I simply ran out of time.</p>
</li>
</ul>
<h2 id="heading-time-is-not-on-your-side">Time is not on your side</h2>
<p>That said, manage your time and tasks accordingly. Flag the questions you won’t answer right away or those you anticipate requiring more time and effort.</p>
<p>I tried to answer the smaller/easier/preferred tasks first (there were no easy questions on the CKS, only easier and personally preferred).</p>
<p>Then I returned to the bigger and harder tasks. I was working right up to the last second. Time is the real test of this exam.</p>
<p>I passed on my second attempt after failing the first time with a score of 57. I had a good understanding of what I needed to improve, which was generally completing tasks more quickly and focusing on areas like Falco, making changes to the etcd configuration, refreshing my knowledge of working with Service Accounts and Tokens, and refactoring deployments to run in a restricted namespace with Pod Security Admission.</p>
<h2 id="heading-my-closing-thoughts">My closing thoughts</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750166232708/d52b4b9b-a3c8-41df-9ff1-00c0867ead62.png" alt class="image--center mx-auto" /></p>
<p>I didn’t feel hard done by failing my first attempt, it felt hard and I didn’t feel like I knew it all enough, a bit like when you die in Dark Souls, you didn’t die because the game is unfair, just like I didn’t fail my first exam because the exam was unfair.</p>
<p>It’s because I wasn’t good enough (This is why I don’t play Souls games….)</p>
<p>I passed 2nd time around on the nose, with a very efficient 67. My failures helped me get to the point that I passed.</p>
<p>My advice….. Practice. Just practice for speed and accuracy. There are some great scenarios over on <a target="_blank" href="https://killercoda.com/killer-shell-cks">killercoda’s Killer Shell</a>. I went through these a lot until I could get through them with minimal documentation help (I spent too much time searching the docs on my first attempt).</p>
<p>I worked and studied every day leading up to the CKS, just practising scenarios and reading. I learn best in small, frequent sessions rather than once per week for a long time. But we’re all different. Little and often is a real-time commitment on top of full-time project work and being a dad of two (with a very supportive and understanding wife!).</p>
<p>Sorry, this one turned into a bit of a long one. This was a real journey, and I'm super happy to be a part of the Kubestronaut program. Passing all 5 certification exams was a real test of skill and experience, and I really enjoyed the learning process.</p>
<p>#FailureisnotFinal #PracticeMakesPermanent</p>
<p>Study resources I found useful:</p>
<ul>
<li><p>I shouldn’t have to tell you, but <a target="_blank" href="https://kubernetes.io/">kubernetes.io</a> and get good and searching for what you need quickly!</p>
</li>
<li><p>It might be obvious, but check the official <a target="_blank" href="https://training.linuxfoundation.org/certification/certified-kubernetes-security-specialist/">CKS exam page</a> for all the domains and competencies</p>
</li>
<li><p><a target="_blank" href="https://learn.kodekloud.com/user/courses/certified-kubernetes-security-specialist-cks">Kode Kloud - The CKS course</a> is quite comprehensive and has some great practice labs</p>
</li>
<li><p><a target="_blank" href="https://learn.kodekloud.com/user/courses/cks-challenges">Kode Kloud - Challenge labs</a></p>
</li>
<li><p><a target="_blank" href="https://killercoda.com/killer-shell-cks">Killer Coda - Killer Shell CKS scenarios</a></p>
</li>
<li><p><a target="_blank" href="https://youtu.be/d9xfB5qaOfg?si=xNoSWSupUaZ3bDpk">Kubernetes CKS Full course on YouTube</a></p>
</li>
<li><p><a target="_blank" href="https://editor.networkpolicy.io/">Network Policy Editor online tool</a></p>
</li>
<li><p>Make use of the Exam simulator with the exam purchase, you get 2 36 hour sessions. Use them!</p>
</li>
</ul>
<p>Let me know what you think of certifications, Kubernetes or the Kubestronaut programme!</p>
<p>I’ve had a couple of people message me via <a target="_blank" href="https://www.linkedin.com/in/ferrishall/">LinkedIn</a> asking about learning paths and development, so don’t be a stranger!</p>
]]></content:encoded></item><item><title><![CDATA[Getting Started with Falco Security Tool  on GKE]]></title><description><![CDATA[(Not sure why I only got half an image :shrug AI image creation continues to amaze me….)
As you can probably tell from a lot of my previous posts, I’ve been having a lot of fun with Kubernetes, I’m currently trying my hand at the Certified Kubernetes...]]></description><link>https://ferrishall.dev/getting-started-with-falco-security-tool-on-gke</link><guid isPermaLink="true">https://ferrishall.dev/getting-started-with-falco-security-tool-on-gke</guid><category><![CDATA[falco]]></category><category><![CDATA[Security]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[containers]]></category><category><![CDATA[gke]]></category><category><![CDATA[Kubernetes Security]]></category><category><![CDATA[DevSecOps]]></category><category><![CDATA[Devops]]></category><category><![CDATA[CKS]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Mon, 12 May 2025 15:19:45 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1747410270465/76445ab4-0b35-421e-8504-dafa201934fa.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>(Not sure why I only got half an image :shrug AI image creation continues to amaze me….)</p>
<p>As you can probably tell from a lot of my previous posts, I’ve been having a lot of fun with Kubernetes, I’m currently trying my hand at the Certified Kubernetes Security Specialist, known as the most challenging Kubernetes exam of them all.</p>
<p>Which is why I’m going to do a bit of a quick start write up getting started with installing and using Falco on GKE.</p>
<h2 id="heading-what-is-falco">What is Falco?</h2>
<p>Falco is a cloud native security tool that provides runtime security across hosts, containers, Kubernetes, and cloud environments. It is designed to detect and alert on abnormal behaviour and potential security threats in real-time.</p>
<p>It’s essentially a real time monitoring tool that alerts against preconfigured rules and custom rules configured by us administrators.</p>
<p>Falco deploys with some preconfigured rules that check the Linux kernel for any unusual behaviour, including but a few:</p>
<ul>
<li><p>Privilege escalation using privileged containers</p>
</li>
<li><p>Executing shell binaries such as <code>sh</code>, <code>bash</code>, <code>csh</code>, <code>zsh</code>, etc</p>
</li>
<li><p>Executing SSH binaries such as <code>ssh</code>, <code>scp</code>, <code>sftp</code>, etc</p>
</li>
<li><p>Read/Writes to well-known directories such as <code>/etc</code>, <code>/usr/bin</code>, <code>/usr/sbin</code>, etc</p>
</li>
</ul>
<p>This is a brief overview. You can find more info on what Falco is and why at the <a target="_blank" href="https://falco.org/docs/">Falco docs site</a>.</p>
<h2 id="heading-installing-falco-on-gke">Installing Falco on GKE</h2>
<p>Falco is fairly easy to get started with, you can install it on a VM, compute instance or Kubernetes Cluster. You’ll find a quick start style tutorial on the <a target="_blank" href="https://falco.org/docs/getting-started/falco-kubernetes-quickstart/">Falco Getting Started</a> page.</p>
<p>I’ve gone for a GKE cluster. I had one already up and running, so I’ve opted to install Falco on the cluster using Helm. You can use a Linux VM or any other type of cloud managed Kubernetes provider.</p>
<pre><code class="lang-bash">kubectl create ns falco

helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update

helm install falco \
-n falco \
--<span class="hljs-built_in">set</span> tty=<span class="hljs-literal">true</span> \
--<span class="hljs-built_in">set</span> driver.kind=ebpf \
falcosecurity/falco
</code></pre>
<p>I’ve used the <code>—set driver.kind</code> set to ebpf as I’m using in a GKE cluster and I’m not able to load a kernel module on my GKE cluster.</p>
<p>The Helm chart will then deploy Falco as a DaemonSet, meaning that a pod running Falco will run on each node of the cluster.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746785429747/14321543-8797-4baf-ae66-df514f1a4c4c.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-testing-alerting">Testing Alerting</h2>
<p>Let’s try out Falco and see the pre-configured rules in action!</p>
<p>Let’s spin up a pod and have it do something to trigger a rule.</p>
<pre><code class="lang-bash">kubectl run pod <span class="hljs-built_in">test</span> --image nginx
</code></pre>
<p>Let’s have the test pod do something “unusual”.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746785756792/ea2d03d1-c421-4753-9b41-5d53757c08c7.png" alt class="image--center mx-auto" /></p>
<p>We can check the logs on the Falco pods.</p>
<pre><code class="lang-bash">kubectl -n falco logs -c falco -l app.kubernetes.io/name=falco
</code></pre>
<p>I’ve cut off the top of the logs so you can see the logs from the commands I just ran:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746785864436/93e2aa00-0669-4404-9b90-10b36aba2403.png" alt class="image--center mx-auto" /></p>
<p>There’s a lot of info there, but what we’re looking for is timestamps, the pod that was alerted on and what the pod to alert.</p>
<pre><code class="lang-bash">10:14:56.930581844: Notice A shell was spawned <span class="hljs-keyword">in</span> a container with an attached terminal (evt_type=execve user=root user_uid=0 user_loginuid=-1 process=sh proc_exepath=/usr/bin/dash parent=containerd-shim <span class="hljs-built_in">command</span>=sh -c ls -la terminal=34816 exe_flags=EXE_WRITABLE|EXE_LOWER_LAYER container_id=9f03a0d2a1ed container_image=docker.io/library/nginx container_image_tag=latest container_name=<span class="hljs-built_in">test</span> k8s_ns=default k8s_pod_name=<span class="hljs-built_in">test</span>)
10:15:10.218050374: Notice A shell was spawned <span class="hljs-keyword">in</span> a container with an attached terminal (evt_type=execve user=root user_uid=0 user_loginuid=-1 process=sh proc_exepath=/usr/bin/dash parent=containerd-shim <span class="hljs-built_in">command</span>=sh -c ls -la /root terminal=34816 exe_flags=EXE_WRITABLE|EXE_LOWER_LAYER container_id=9f03a0d2a1ed container_image=docker.io/library/nginx container_image_tag=latest container_name=<span class="hljs-built_in">test</span> k8s_ns=default k8s_pod_name=<span class="hljs-built_in">test</span>)
</code></pre>
<p>For example, the logs we have above the very last part tell us the container and pod:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746786012435/9e4f6463-3ec5-4d2e-99bc-3e34aaa5a2a9.png" alt class="image--center mx-auto" /></p>
<p>The start is the message “Notice A shell was spawned in a container with an attached terminal“ (We’ll use this later!).</p>
<p>But look into the log message more, and we can see what actually triggered the alert:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746786066573/96d84af0-5ac7-46f0-a457-6edd16423e44.png" alt class="image--center mx-auto" /></p>
<p>Essentially, the container “test” in the pod named “test” ran the “ls -la” and in another instance “ls -la /root“.</p>
<p>It's not ideal! That could signify a bad actor trying to work their way around a pod and potentially gaining access to the underlying node and our wider infrastructure. So its a good thing we have an rule triggering this event!</p>
<h2 id="heading-logging-and-alerting-falco-in-google-cloud-monitoring">Logging and alerting Falco in Google Cloud Monitoring</h2>
<p>Now with this running. In GKE, wouldn’t it be nice not to have to remember to go trawling through the logs everynow and then just to know what’s happening in our cluster? The answer is yes!</p>
<p>Google Cloud comes ready with a very comprehensive logging and monitoring suite of tools that we can make use of to alert on the content of a log message. GKE integrates with Google Cloud Logging out of the box so we should definitely make use of it, let’s take a look how.</p>
<p>Falco logs are output using stdout, so with the logging capabilities of GKE, they appear in Google Cloud Logging from the “Workloads” page in the GKE cluster. I can choose “Falco” and look at the logs produced from the pods:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746786573520/95a90850-db36-4ddd-85d9-ac7cbe64e021.png" alt class="image--center mx-auto" /></p>
<p>What’s neat is that you can fine tune your Logging query to find the logs you are interested in right now, clicking the “View in Logs Explorer“.</p>
<p>Now in Logs Explorer, we can see the LQL query and make some changes to find the log content we’re interested in:</p>
<pre><code class="lang-bash">resource.type=<span class="hljs-string">"k8s_container"</span>
resource.labels.project_id=<span class="hljs-string">"gcp-project-id"</span>
resource.labels.location=<span class="hljs-string">"europe-west2-b"</span>
resource.labels.cluster_name=<span class="hljs-string">"cks-cluster"</span>
resource.labels.namespace_name=<span class="hljs-string">"falco"</span>
labels.k8s-pod/app_kubernetes_io/instance=<span class="hljs-string">"falco"</span>
labels.k8s-pod/app_kubernetes_io/name=<span class="hljs-string">"falco"</span> severity&gt;=DEFAULT
textPayload:<span class="hljs-string">"Notice A shell was spawned in a container with an attached terminal"</span>
</code></pre>
<p>In the last hour, we can see the logs (plus another I did!) that Falco alerted on from shell sessions spawned in pods on the cluster, reducing the noise to specify the testPayload to search for logs with the following testPayload “Notice A shell was spawned in a container with an attached terminal".</p>
<p>Much easier to find what we’re looking for!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747052722341/22bb3eb7-82ba-4a77-85c1-abd234a9037f.png" alt class="image--center mx-auto" /></p>
<p>With this log query, we can also create a log based alert, super simple! Click the “Actions” button and then “Create log alert“.</p>
<p>Give the alert a name, it will grab our log query and we set the frequency and the channel to notify.</p>
<p>Let’s fire off some more Kubectl exec and create some alerts……</p>
<p>That was quick!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746805257240/36e637e0-f865-432a-becb-a831c9c34e88.png" alt class="image--center mx-auto" /></p>
<p>Let’s look at the actual incident that’s been created.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746805465331/68c6b860-ed16-4d0a-8afb-b4190228ae7f.png" alt class="image--center mx-auto" /></p>
<p>In the incident, you can view the log/s that caused the alert and you can pop out to the Google Logs Explorer page again.</p>
<p>And the email alert got sent, so if I wasn’t staring at the monitoring dashboards, I certainly know about it now!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747052904256/e55d2d97-38c9-419e-ab34-26024706a4bd.png" alt class="image--center mx-auto" /></p>
<p>The alert also doesn’t have to be an email, Google CLoud Monitoring has a choice of notification channels, could be Google Chat, Pager Duty for the really important alerts, or PubSub for something completely different.</p>
<p>As a former oncall SRE, all I ask is that you alert and wake up engineers for something serious that they need to be present for!</p>
<h2 id="heading-creating-custom-rules">Creating custom rules</h2>
<p>Now, lets say we have some scenarios which are not covered by the rules that come configured in Falco? Thats where custom rules come in. We can create our rules for Falco to alert on.</p>
<p>The default Falco configuration will load rules from <code>/etc/falco/falco_rules.yaml</code>, <code>/etc/falco/falco_rules.local.yaml</code> and <code>/etc/falco/rules.d</code>.</p>
<p>My current deployment of Falco (and yours if your following along…) was done via Helm, so it’s a case of creating a custom rule yaml file and updating the Helm deployment.</p>
<p>I borrowed this from the Falco quick start to get going with some minor changes:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">customRules:</span>
  <span class="hljs-attr">custom-rules.yaml:</span> <span class="hljs-string">|-
    - rule: Write into etc
      desc: An attempt to write to the /etc directory
      condition: &gt;
        (evt.type in (open,openat,openat2) and evt.is_open_write=true and fd.typechar='f' and fd.num&gt;=0)
        and fd.name startswith /etc
      output: "Stop what your doing and look at this!! File below /etc opened for writing (file=%fd.name pcmdline=%proc.pcmdline gparent=%proc.aname[2] ggparent=%proc.aname[3] gggparent=%proc.aname[4] evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty %container.info)"
      priority: WARNING
      tags: [filesystem, mitre_persistence]</span>
</code></pre>
<p>As a brief overview of the rule we’re creating here, in the condition, we’re stating that the event type of the syscall open, openat or openat2 and the <a target="_blank" href="https://falco.org/docs/reference/rules/default-macros/#file-opened-for-writing">event type Macro</a> used is opened for writing (evt.is_open_write equaling true), which starts with <code>/etc</code> .</p>
<p>The output section then determines what is logged out to what we see in the Falco pod logging and also in this case, Google Cloud Logging. You can use event fields for the outputs to make things easier to interpolate rather than hardcoding. The <a target="_blank" href="https://falco.org/docs/reference/rules/supported-fields/">documentation</a> is quite comprehensive.</p>
<p>Finally tags, these are optional but handy for organising rules into categories. Example, you could you could tell Falco to skip all rules with a particular stage in a dev environment. <a target="_blank" href="https://falco.org/docs/concepts/rules/controlling-rules/#tags">Tags</a> are a handy way of controlling the rules in use</p>
<p>Now to update the Helm deployment to include the custom rules yaml file with the Falco deployment.</p>
<pre><code class="lang-bash">helm upgrade --namespace falco falco falcosecurity/falco --<span class="hljs-built_in">set</span> tty=<span class="hljs-literal">true</span> -f custom_rules.yaml
</code></pre>
<p>Wait for the pods to restart.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747044813555/c8640bb3-b723-408f-bdd5-4a0ac4f7b836.png" alt class="image--center mx-auto" /></p>
<p>Let’s trigger an alert by exec’ing into the test pod and try to write to /etc</p>
<p>Checking the logs in the Falco pod or in Google Cloud Logs Explorer shows the attempt:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747045019589/282427de-b3c6-4234-bcd8-7c98f170b315.png" alt class="image--center mx-auto" /></p>
<p>Screenshots are hard to read….. Here’s a copy paste of the log that was output to Google Cloud Logging.</p>
<p><code>2025-05-12 11:14:47.152 BST</code></p>
<p><code>10:14:47.147029349: Warning Stop what your doing and look at this!! File below /etc opened for writing (file=/etc/test.txt pcmdline=sh -c touch /etc/test.txt gparent=containerd-shim ggparent=systemd gggparent=&lt;NA&gt; evt_type=openat user=root user_uid=0 user_loginuid=-1 process=touch proc_exepath=/usr/bin/touch parent=sh command=touch /etc/test.txt terminal=34817 container_id=71ee5d3c8123 container_image=</code><a target="_blank" href="http://docker.io/library/nginx"><code>docker.io/library/nginx</code></a> <code>container_image_tag=latest container_name=test k8s_ns=default k8s_pod_name=test)</code></p>
<p><img src="https://media.tenor.com/gCN3W3TQErkAAAAe/anchorman-stop-what-youre-doing.png" alt="Anchorman Stop What Youre Doing GIF - Anchorman Stop What Youre Doing  Listen - Discover &amp; Share GIFs" /></p>
<p>You can see the custom message I added to output “Stop what your doing and look at this!!“</p>
<p>More info on creating custom rules <a target="_blank" href="https://falco.org/docs/concepts/rules/default-custom/">here</a>.</p>
<p>We’re just scratvhing the surface on making the Falco rules work for us, there are further more advanced things you can do that the documentation points to in more depth <a target="_blank" href="https://falco.org/docs/concepts/rules/">here</a>, including how to override, add exceptions and writing your own rules etc.</p>
<h2 id="heading-summary">Summary</h2>
<p>Thats it for this quick intro into getting Falco up and running in a GKE Cluster.</p>
<p>To summarise, we had a GKE cluster up and running and we deployed Falco using helm. Any Kubernets Cluster or even a Linux VM should be fine to use. Your install/deployment method will vary.</p>
<p>Once running we tested the default built in rule engine by running a simple nginx pod and tried to sh exec commands reading and writing to root directories.</p>
<p>We then checked the Falco logs for alerts and then created a log based alert in Google Cloud Monitoring, to alert us when a rule had been triggered.</p>
<p>We then checked out custom rules.</p>
<p>I hope you found this useful, feel free to comment with any of your findings or thoughts!</p>
<h2 id="heading-useful-links">Useful Links</h2>
<p><a target="_blank" href="https://falco.org/docs/getting-started/falco-kubernetes-quickstart/">Falco Quick start</a></p>
<p><a target="_blank" href="https://falco.org/docs/concepts/rules/basic-elements/">Basic elements of Falco rules</a></p>
<p><a target="_blank" href="https://falco.org/docs/concepts/rules/custom-ruleset/">Write your first custom rule</a></p>
<p><a target="_blank" href="https://falco.org/docs/reference/rules/default-macros/">Default Macros</a></p>
]]></content:encoded></item><item><title><![CDATA[Istio Service Mesh Deep(ish) Dive: Architecture, Traffic Control, Security and Observability]]></title><description><![CDATA[*I’m quite enjoying this “playful and whimsical“ image created by ChatGPT……
This continues from my previous blog post “Getting Started with Istio Service Mesh“.
We’ll be diving a bit deeper, specifically into Istio's architecture, traffic management,...]]></description><link>https://ferrishall.dev/istio-service-mesh-deepish-dive-architecture-traffic-control-security-and-observability</link><guid isPermaLink="true">https://ferrishall.dev/istio-service-mesh-deepish-dive-architecture-traffic-control-security-and-observability</guid><category><![CDATA[#istio]]></category><category><![CDATA[#ServiceMesh]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[cloud native]]></category><category><![CDATA[Cloud Computing]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Security]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Fri, 21 Feb 2025 11:01:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1739892663973/c106bc63-cdc7-418a-991d-463cc1306942.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>*I’m quite enjoying this “playful and whimsical“ image created by ChatGPT……</p>
<p>This continues from my previous blog post <a target="_blank" href="https://ferrishall.dev/getting-started-with-istio-service-mesh">“Getting Started with Istio Service Mesh“</a>.</p>
<p>We’ll be diving a bit deeper, specifically into Istio's architecture, traffic management, security, and observability features.</p>
<h2 id="heading-recap">Recap</h2>
<p>Let’s recap what a service mesh is and what Istio can help us with.</p>
<p>Istio is a service mesh that acts as an infrastructure layer for managing networking, security, and traffic control across microservices. It enables these capabilities without requiring any changes to the application code. Instead, Istio abstracts this functionality into a dedicated control plane and data plane, allowing teams to implement repeatable, complex configurations such as zero-trust security and advanced traffic management, independent of the applications running in their Kubernetes cluster or clusters.</p>
<h2 id="heading-architecture">Architecture</h2>
<p>So what makes up Istio?</p>
<h3 id="heading-the-data-plane">The Data plane</h3>
<p>The Envoy proxy is typically injected into pods as a sidecar container. This sidecar container essentially takes over all the networking, working very closely with the other containers in the pod, all network calls in and out go via the Istio injected sidecar container. This is known as the Data Plane.</p>
<h3 id="heading-the-control-plane">The Control Plane</h3>
<p>The Control Plane part of this architecture is powered by the <code>Istiod</code> deployment in the <code>istio-system</code> namespace, which provides the service discovery, the configuration management and the certificate management.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739543384246/456a544e-ff48-428a-a2a5-0792fabe79a2.png" alt class="image--center mx-auto" /></p>
<p>Hopefully, this diagram helps</p>
<ul>
<li><p><strong>Istiod</strong> (shown with the Istio logo at the bottom) is the control centre, storing and distributing configurations.</p>
</li>
<li><p><strong>Pods</strong> (e.g., App/Pod 1 and 2) contain both the <strong>application container</strong> and the <strong>Envoy sidecar proxy</strong>, injected at pod creation.</p>
</li>
<li><p>All <strong>ingress and egress traffic flows through the Envoy proxies</strong>, where traffic management, authentication, and security policies are enforced based on configurations propagated from the <strong>Control Plane</strong> (Istiod).</p>
</li>
</ul>
<p>With these core components running and working together, we now have a proxy layer which are the Envoy proxies containers running in the pods, injected at pod deployment they can then handle the traffic control features, failover and fault injection, security and authentication features like enforcing security policies and access control.</p>
<p>You can start to imagine if we wanted to introduce this and add these capabilities to our application code it would start to look very different and get more complex very quickly.</p>
<p>Instead, Istio <strong>decouples</strong> these concerns from the application itself, allowing teams to focus on business logic while Istio handles network, security, and observability at scale.</p>
<h2 id="heading-traffic-management">Traffic Management</h2>
<p>Now we’ve had a bit of a recap of what Istio is and why it might be a good idea to introduce a decoupled network layer to configure, intercept and mediate our mesh network, let’s deep dive into just some of the traffic management capabilities it offers.</p>
<p>Let’s start at Gateways, after all, we probably want to learn how we get traffic and requests from outside the Kubernetes cluster to our applications and how we can control and configure them to behave how we want.</p>
<p>When you first install and run Istio on a cluster you can start with a demo profile which will create an Ingress Gateway and an Egress Gateway. They are deployed as Kubernetes objects and essentially act as load balancers (I’m working with Google’s GKE, so that will create a Google Cloud Load Balancer) for incoming and outgoing network requests at the outer edges of the cluster.</p>
<p>Here’s mine running on a GKE cluster.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1740130430771/62ed9bf4-04b5-4ef5-ab14-65713ba23e47.png" alt class="image--center mx-auto" /></p>
<p>The load balancer service it creates in GKE which is a network load balancer in Google Cloud which forwards the requests to the nodes in the GKE cluster.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1740130132663/60e56fe1-cfb1-4dce-aa2e-c66cd691ee20.png" alt class="image--center mx-auto" /></p>
<p>This is created when Istio is installed with the default profile, I don’t have an <code>istio-egressgateway</code>, using <code>kubectl describe</code> on the service in the <code>istio-system</code> namespace gives us some more info about the load balancer. The <code>istio=ingressgateway</code> label is useful for when we create our Ingress Gateway resource which acts as a load balancer at the edge of our cluster.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739545783400/408d660f-4782-4a5a-a93f-adb196fa6831.png" alt class="image--center mx-auto" /></p>
<p>I’ll create a Gateway object. This will configure and point to the Istio <code>ingressgateway</code> which we have above and sets up a proxy to configure the Istio Ingress Gateway and tell it where to send the traffic request.</p>
<p>We’ll configure the hosts for *. I added this as a wildcard so I can access it without a domain name because I haven’t gotten around to setting up the DNS yet (It’s Friday, what are you gonna do?! :shrug)</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739546366255/52b42839-bea1-4639-bafd-74bad2b45319.png" alt class="image--center mx-auto" /></p>
<p>But if we wanted to access the ingress based on a domain, for example <code>ferrishall.dev</code>. I would add that to the <code>hosts</code> field in the yaml file and add the <code>istio-ingressgateway</code> external IP address as an A record in a domain's DNS settings.</p>
<p>This is a super simple gateway code snippet I used to create the Gateway object. I’ll list all the sources that helped me learn and write this up at the bottom of this post.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.istio.io/v1beta1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Gateway</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gateway</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">istio:</span> <span class="hljs-string">ingressgateway</span>
  <span class="hljs-attr">servers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span>
        <span class="hljs-attr">number:</span> <span class="hljs-number">80</span>
        <span class="hljs-attr">name:</span> <span class="hljs-string">http</span>
        <span class="hljs-attr">protocol:</span> <span class="hljs-string">HTTP</span>
      <span class="hljs-attr">hosts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">'*'</span>
</code></pre>
<p>I was reading that there is support for both Istio Gateway and Kubernetes Gateway APIs and the Kubernetes Gateway API will soon be the default so I’ll probably have to re-learn and re-write this…. <a target="_blank" href="https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/">Documentation</a></p>
<p>I’ll spin up a <code>hello-world</code> application first so I have something to manage the traffic to.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739547335818/a0d9e917-5c56-4259-ae5c-545d2ca3201d.png" alt class="image--center mx-auto" /></p>
<p>The <code>hello-world</code> pod has 2 containers, 1 for the <code>hello-world</code> application and the 2nd for the Istio Envoy proxy sidecar.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739547447282/e5d97571-f6b7-44ef-968a-0f1e297e9abe.png" alt class="image--center mx-auto" /></p>
<p>Not shown, the <code>hello-world</code> container, but you can see just some of the <strong>istio-proxy container</strong> info when you run <code>kubectl describe pod hello-world-xxxx</code>.</p>
<p>So I’ve got an Ingress load balancer, a gateway and an application…. how do we tell Istio when it gets traffic from the ingress load balancer to route that request to hello-world? We need a Virtual Service.</p>
<h3 id="heading-virtual-service">Virtual Service</h3>
<p>A <a target="_blank" href="https://istio.io/latest/docs/reference/config/networking/virtual-service/">Virtual Service</a> is used to configure the actual routing rules to the backend services we want to send the traffic to, the hello-world service we just created for example.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.istio.io/v1beta1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">VirtualService</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">hello-world-vs</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">hosts:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">"*"</span>
  <span class="hljs-attr">gateways:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">gateway</span>
  <span class="hljs-attr">http:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">route:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">destination:</span>
            <span class="hljs-attr">host:</span> <span class="hljs-string">hello-world.default.svc.cluster.local</span>
            <span class="hljs-attr">port:</span>
              <span class="hljs-attr">number:</span> <span class="hljs-number">80</span>
</code></pre>
<p>Again, I’m using the <strong>*</strong> in the <code>hosts</code> field because I haven’t set up DNS etc and we’re telling the virtual service that it is bound to the Gateway object called….. gateway and to please route requests to the backend service hello-world which I’m using the fully qualified domain name in the cluster to avoid any potential confusion with namespaces etc</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739548646103/9fae470b-2c78-4335-8d45-7d2850929fc8.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-destination-rules-amp-virtual-services">Destination Rules &amp; Virtual Services</h3>
<p>Let’s try something else because that seems like a lot of work to get a hello-world application working!</p>
<p>I’ve deployed a web frontend and a customers backend application onto my cluster.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739549197057/9d1d21ba-61aa-4534-94c7-72c945c03838.png" alt class="image--center mx-auto" /></p>
<p>Now say there is a <code>customers</code> deployment version <code>v2</code> which we want to test, we want to configure a percentage of some of the traffic to the <code>v2</code> labelled pods to test some new features, ie we want to perform some canary testing.</p>
<p>We can create a <strong>Destination Rule</strong> for our service and define the two subsets representing <code>v1</code> and <code>v2</code> versions. <a target="_blank" href="https://istio.io/latest/docs/reference/config/networking/destination-rule/">A Destination Rule</a> defines a policy that is applied to the traffic which is intended for a service but after the routing has taken place, rules can specify load balancing configuration. like ROUND_ROBIN, connection pool settings etc.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.istio.io/v1beta1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">DestinationRule</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">customers-dr</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">host:</span> <span class="hljs-string">customers.default.svc.cluster.local</span>
  <span class="hljs-attr">subsets:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">v1</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">version:</span> <span class="hljs-string">v1</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">v2</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">version:</span> <span class="hljs-string">v2</span>
</code></pre>
<p>This Destination Rule is telling Istio there are 2 versions of the <code>customers</code> service, which are labelled <code>version: v1</code> and <code>version: v2</code>, known as subsets. Both pod versions are under the same customer ClusterIP service.</p>
<p>So when I create my Virtual Service to direct the traffic to my customer’s service, I can configure it with some rout weighting options.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.istio.io/v1beta1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">VirtualService</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">customers-vs</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">hosts:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">'customers.default.svc.cluster.local'</span>
  <span class="hljs-attr">http:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">route:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">destination:</span>
            <span class="hljs-attr">host:</span> <span class="hljs-string">customers.default.svc.cluster.local</span>
            <span class="hljs-attr">port:</span>
              <span class="hljs-attr">number:</span> <span class="hljs-number">80</span>
            <span class="hljs-attr">subset:</span> <span class="hljs-string">v1</span>
          <span class="hljs-attr">weight:</span> <span class="hljs-number">50</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">destination:</span>
            <span class="hljs-attr">host:</span> <span class="hljs-string">customers.default.svc.cluster.local</span>
            <span class="hljs-attr">port:</span>
              <span class="hljs-attr">number:</span> <span class="hljs-number">80</span>
            <span class="hljs-attr">subset:</span> <span class="hljs-string">v2</span>
          <span class="hljs-attr">weight:</span> <span class="hljs-number">50</span>
</code></pre>
<p>So now Istio knows to send 50% of traffic to <code>version: v1</code> and the other 50% to <code>version: v2</code> of the customer’s pods in the <code>customers</code> service.</p>
<p>We didn’t have to change any of the application code or Kubernetes deployment manifest files, it’s all configured and deployed with these Istio yaml manifest files and decoupled from the application code and Kubernetes manifest yaml files.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739549598183/4140abec-e34f-4f8c-ac13-2b973ddd3ac3.png" alt class="image--center mx-auto" /></p>
<p>I’m just scratching the surface and there is a lot more that can be done like matching traffic requests to a service based on request header content, traffic mirroring etc.</p>
<p>I’m just trying to keep it simple and in context to what hopefully, the people reading this might find helpful and relatable. When you get playing with it you’ll find your own advanced use cases and content!</p>
<h2 id="heading-security">Security</h2>
<p>Istio also helps us apply security to our distributed applications using mutual TLS.</p>
<p>Here is an image taken from my Kiali dashboard, to demonstrate I can access the <code>customer</code> service directly and I can also access via the gateway IP address which is managed by Istio.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739793675669/021c8a43-3efc-4b40-a3fb-fee2f99888a9.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-peer-authentication">Peer Authentication</h3>
<p>What I’ve done here is reset some things, I deployed the <code>web-frontend</code> application without the Istio proxy injection and deployed the <code>customers</code> application which does have the Istio proxy injected. I also updated the virtual service to send traffic to the <code>customers</code> application via the <code>istio-ingressgateway</code> load balancer IP.</p>
<p>The <code>unknown</code> entry in the dashboard is the <code>web-frontend</code> application because it doesn’t have the Istio proxy injected, Istio doesn’t know what or where it is and importantly, doesn’t have the security lock symbol on the traffic, it’s not protected with <strong>mTLS</strong>.</p>
<p>Proxies send plain text traffic between services that do not have the sidecar injected.</p>
<p>I updated the virtual service for customers to point to the gateway and send a curl command with the <code>customers</code> service URI in the header, the Gateway object has an Istio proxy sidecar injected so it will send the request traffic using <strong>mTLS</strong>.</p>
<p>(In a nutshell <strong>Mutual TLS</strong> is where both the client and the server have to authenticate and verify there identity, with TLS its one way where the server’s identity is authenticated by the client. This <a target="_blank" href="https://www.f5.com/labs/learning-center/what-is-mtls">article on mTLS</a> explains the differences nicely.)</p>
<p>So how can we enforce <strong>mTLS</strong> security? We can apply <a target="_blank" href="https://istio.io/latest/docs/reference/config/security/peer_authentication/">peer authentication</a> configuration.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">security.istio.io/v1beta1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">PeerAuthentication</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">default</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">default</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">mtls:</span>
    <span class="hljs-attr">mode:</span> <span class="hljs-string">STRICT</span>
</code></pre>
<p>Now our applications, via the Istio Envoy proxy will only accept and transmit requests via <strong>mTLS</strong> connections but what does this mean for our <code>web-frontend</code> we deployed without the Istio proxy sidecar?….</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739795179232/478703f8-f056-4e47-8fd2-ef0ca69ca053.png" alt class="image--center mx-auto" /></p>
<p>Nope, not happening.</p>
<p>The current <code>web-frontend</code> hasn’t been deployed with the Istio proxy sidecar container injected into the pod so it’s trying to send requests to the <code>customers</code> service application pod via plain text. Istio configuration which has the <code>PeerAuthentication</code> configured to strict is saying no thanks, let’s redeploy the <code>web-fontend</code> with the Istio injection enabled….</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739800348428/fd75b81c-3cfc-4a6b-bdc9-a2002b2028c1.png" alt class="image--center mx-auto" /></p>
<p>That looks better!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739800482043/4fc130c6-b539-425c-99e4-ad01a2d07a66.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-authorisation-policies">Authorisation Policies</h3>
<p>We can also control the flow of requests, just because all the pods in our cluster are deployed and managed by us does that mean they should have access to everything running the cluster? Does everything need access to that database? Or our <code>customers</code> application?</p>
<p>No of course not we want to be specific and use the principle of least privilege not just to users, service accounts and the permissions they have but to network connections and requests of the workloads running the in the cluster.</p>
<p>There’s where <a target="_blank" href="https://istio.io/latest/docs/reference/config/security/authorization-policy/">Authorisation Policies</a> can help (I’m not writing Authorization, you can’t make me….)</p>
<p>We can configure what pods in which particular namespace and which principal (eg service account) is allowed or denied access to the pod or anything in the whole namespace right down to which operation they are allowed to perform on a particular path.</p>
<p>In our case, say we want to lock down access to the <code>customers</code> application to only accept requests from the <code>web-frontend</code> application and that is only allowed to come via the <code>istio-ingressgateway</code> load balancer.</p>
<p>First, we’ll flat out deny everything, giving us a bit of a clean slate.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">security.istio.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">AuthorizationPolicy</span>
<span class="hljs-attr">metadata:</span>
 <span class="hljs-attr">name:</span> <span class="hljs-string">deny-all</span>
 <span class="hljs-attr">namespace:</span> <span class="hljs-string">default</span>
<span class="hljs-attr">spec:</span>
  {}
</code></pre>
<p>Then we can apply the first authorisation policy.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">security.istio.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">AuthorizationPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-ingress-frontend</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">default</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">web-frontend</span>
  <span class="hljs-attr">action:</span> <span class="hljs-string">ALLOW</span>
  <span class="hljs-attr">rules:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">from:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">source:</span>
            <span class="hljs-attr">namespaces:</span> [<span class="hljs-string">"istio-system"</span>]
        <span class="hljs-bullet">-</span> <span class="hljs-attr">source:</span>
            <span class="hljs-attr">principals:</span> [<span class="hljs-string">"cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account"</span>]
</code></pre>
<p>That covers the <code>istio-ingressgateway</code> authorised allowed to access the pods labelled <code>app: web-frontend</code>, we need the <code>web-frontend</code> to be allowed to access the <code>customers</code> application.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">security.istio.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">AuthorizationPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-web-frontend-customers</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">default</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">customers</span>
        <span class="hljs-attr">version:</span> <span class="hljs-string">v1</span>
  <span class="hljs-attr">action:</span> <span class="hljs-string">ALLOW</span>
  <span class="hljs-attr">rules:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">from:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">source:</span>
        <span class="hljs-attr">namespaces:</span> [<span class="hljs-string">"default"</span>]
      <span class="hljs-attr">source:</span>
        <span class="hljs-attr">principals:</span> [<span class="hljs-string">"cluster.local/ns/default/sa/web-frontend"</span>]
</code></pre>
<p>This covers the requests coming from <code>web-frontend</code> being allowed to access the <code>customers</code> pods which are labelled <code>app: customers</code> and the requests are coming from the <code>default</code> namespace and pods running as the <code>web-frontend</code> service account.</p>
<p>Let’s test that the deny rule is still working, I’ve just created a pod running a <code>busybox</code> curl image</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739872991739/e0ab901a-49ca-41b7-a57f-95c9598c2459.png" alt class="image--center mx-auto" /></p>
<p>This pod doesn’t have the expected principal or labels we specified in the Authorisation Policies so it’s getting <code>RBAC: denied</code>. Just as expected!</p>
<p>You might be thinking “Why not just use the network policy Kubernetes object?“ you absolutely can, but the key difference is network policies are layer 4 IP based network policies and the Istio authorisation policies are layer 7 so you can add HTTP header checks and other attributes. So you just get more granular options with Istio authorisation policies than network policies, all depends on your use case at the end of the day.</p>
<p>Again, I’m just scratching the surface here. There are a lot more security and Authorisation Policies you can adopt and deploy with Istio. More info you can <a target="_blank" href="https://istio.io/latest/docs/tasks/security/">find here</a></p>
<h2 id="heading-observability">Observability</h2>
<p>Finally, we come to observability. We’ve done some cool things with Istio but another feature it adds to is giving us insight using logs and metrics showing us what is actually happening within our distributed network.</p>
<p>Also, wouldn’t it be nice to measure the performance and alert us when performance starts degrading before it gets unacceptable to our users? That’s what we’ll take a look at in this final section of our deep dive.</p>
<h3 id="heading-proxy-metrics">Proxy Metrics</h3>
<p>The Istio Envoy Proxies produce metrics and logs. Collecting valuable insight in the form of metrics about the traffic passing in and out of the proxies. <a target="_blank" href="https://istio.io/latest/docs/concepts/observability/#proxy-level-metrics">Documentation here.</a></p>
<h3 id="heading-service-metrics">Service Metrics</h3>
<p>Istio also provides <a target="_blank" href="https://istio.io/latest/docs/concepts/observability/#service-level-metrics">metrics at the service level</a>, the Istio addons via the GitHub repo supply some very handy default dashboards for visualising in Grafana.</p>
<h3 id="heading-control-plane-metrics">Control Plane Metrics</h3>
<p>Istio itself should also be monitored, so you can keep an eye on the performance and that it is behaving as expected, especially as you scale.</p>
<h3 id="heading-kiali-dashboard">Kiali dashboard</h3>
<p>Now who doesn’t love a good dashboard?! The Kiali dashboard is a nice way to get started and get some visual insight into your applications and the Istio proxies.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739785190901/7f5d9c47-ffdd-4b58-83b6-c2d8fc83c1be.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-grafana">Grafana</h3>
<p>You might be acquainted with Prometheus and Grafana so I won’t spend any time explaining what and why. You can find some default dashboards with the <code>samples/addons</code> directory of the Istio GitHub repo that provides visibility using the metrics collected from Prometheus.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739879040604/1f0f28f9-30cc-4aa9-a0a3-99fca819aa00.png" alt class="image--center mx-auto" /></p>
<p>Visualise the metrics for the services we have the Envoy Proxies injected into. The <code>customers</code> application metrics.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739879386163/76d6acba-7407-4cd7-b06f-63efe414ea6f.png" alt class="image--center mx-auto" /></p>
<p>You’ll find a really nice set of addons in the <a target="_blank" href="https://github.com/istio/istio/tree/master/samples/addons">Istio GitHub repo</a> to help you get started with observability in Istio and the applications or services we are using and proxying with Istio.</p>
<p>I’ll leave observability there, there’s a lot more to discover but it’s important to take away that metrics, logs etc are all things that Istio adds. This is super handy as our applications and architecture get more complex, more distributed and start scaling, that we have as much insight into performance and will help us troubleshoot any issues that might arise.</p>
<p>The more data we have, the more informed technical and business decisions we can make!</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Sorry, that was a bit longer than I was aiming for but you know how it is when you get into it.</p>
<p>In conclusion, Istio provides a robust and flexible service mesh solution that enhances the management of microservices in Kubernetes environments. By decoupling network, security, and observability concerns from application code, Istio allows development teams to focus on business logic while ensuring efficient traffic management, robust security through mutual TLS, and comprehensive observability.</p>
<p>The integration of tools like Kiali and Grafana further enriches the user experience and observability by providing valuable insights into service performance and network behaviour.</p>
<p>As you continue to explore Istio, especially in multi-cluster environments, you'll discover even more advanced capabilities that can further optimize your microservices architecture.</p>
<h2 id="heading-thats-all-folks">That’s all folks!</h2>
<p>Hopefully, this was a handy deep dive into Istio, I’ve been learning and tinkering with Istio for the last few months and really having some fun with it (proper nerdy, I know…) writing about it helps me understand and commit to memory!</p>
<p>Hopefully, my words help explain some of the parts and use to beginners starting with Istio or just why would you use it in the first place.</p>
<p>I’m going to be taking my learning to multi cluster Istio service mesh, so hopefully I’ll write something up about that in the near future!</p>
<p>As always, really interested to hear what people’s thoughts are and what alternatives to try out. (Linkerd is next on my list) and if I got anything wrong please let me know!</p>
<h2 id="heading-helpful-links">Helpful links</h2>
<p><a target="_blank" href="https://istio.io/latest/docs/">Istio documentation</a></p>
<p><a target="_blank" href="https://istio.io/latest/docs/setup/getting-started/">Getting started with Istio</a></p>
<p><a target="_blank" href="https://github.com/istio/istio/tree/1.24.3">Istio GitHub repository</a></p>
<p><a target="_blank" href="https://training.linuxfoundation.org/training/introduction-to-istio-lfs144/">Introduction to Istio Linux Foundation course (Free!)</a></p>
<p><a target="_blank" href="https://github.com/lftraining/LFS144x">Linux Foundation Intro to istio GitHub repository</a></p>
<p><em>Disclaimer: I have no affiliation with the Linux Foundation, this course or its authors, or claim to have created any of this code, but it is very helpful to use for learning this topic and recommend it very highly!</em></p>
]]></content:encoded></item><item><title><![CDATA[Getting started with Istio Service Mesh]]></title><description><![CDATA[(Aren't AI generated images just…… interesting)
To carry on with my current learning and discovery with Kubernetes, I’ve written about creating a cluster the same guide that I used to get my homelab cluster running for practising with passing the CKS...]]></description><link>https://ferrishall.dev/getting-started-with-istio-service-mesh</link><guid isPermaLink="true">https://ferrishall.dev/getting-started-with-istio-service-mesh</guid><category><![CDATA[#istio]]></category><category><![CDATA[#ServiceMesh]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[Devops]]></category><category><![CDATA[networking]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Fri, 03 Jan 2025 14:05:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1735912560764/8ca70997-5ae4-4dc8-83ba-07fba4c40b8b.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>(Aren't AI generated images just…… interesting)</p>
<p>To carry on with my current learning and discovery with Kubernetes, I’ve written about <a target="_blank" href="https://ferrishall.dev/how-to-set-up-a-kubernetes-cluster-for-studying-and-exam-preparation">creating a cluster</a> the same guide that I used to get my homelab cluster running for practising with passing the CKS and CKAD certifications. I’ve also written and talked about <a target="_blank" href="https://ferrishall.dev/gitops-with-github-actions-and-argo-cd">GitOps with Argo CD</a>. I even wrote about my trials and tribulations with <a target="_blank" href="https://ferrishall.dev/flux-cd-vs-argo-cd">Flux CD and Argo CD</a> (Shameless plugs to previous articles over…).</p>
<p>So let’s check out networking! Everyone loves networking… when it’s working.</p>
<p>So why would networking with Kubernetes be a “thing” then? Surely the cluster sits on a network and just works?! Well, kind of…..</p>
<p>With Kubernetes, we have potentially broken down a monolith application where everything lived on a single box or VM and just contacted localhost:some_port or maybe another VM on the same network, maybe some firewall rules. We’ve now broken this down to different applications, which are now pods part of deployments, potentially multiple versions for testing, maybe even on different clusters.</p>
<p>What I’m trying to say is that we have introduced some added complexity for some benefits, which I won't go into as I’m sure if you're reading this, you know what they are. But we have to think about how all these distributed pods and applications interact or don’t interact with each other.</p>
<h2 id="heading-so-what-exactly-is-istio">So what exactly is Istio?</h2>
<p>Istio uses a proxy to intercept all your network traffic, allowing a broad set of application-aware features based on the configuration you set.</p>
<p>The <strong>control plane</strong> takes your desired configuration, and its view of the services, and dynamically programs the proxy servers, updating them as the rules or the environment changes.</p>
<p>The <strong>data plane</strong> is the communication between services. Without a service mesh, the network doesn’t understand the traffic being sent over, and can’t make any decisions based on what type of traffic it is, or who it is from or to. The data plane comprises Envoy sidecar proxies injected into application pods, handling actual traffic routing, security, and observability.</p>
<h2 id="heading-seriously-though-why">Seriously though, why?</h2>
<p>The more our applications scale and potentially grow to additional deployments, stateful sets or just pods (please don’t just run pods…) they all need some networking or interaction with other deployments, pods etc.</p>
<p>If you're starting with a hello-world app that has a frontend and backend, maybe a database then of course a service mesh will be overkill, but think about if we introduce multiple versions of these deployments. Or maybe we add more services. Like some reviews for our website? Or an ad service? Does a shopping cart get added too? It starts to add up.</p>
<p>With a service mesh, we can abstract the networking configuration layer and decouple the networking away from the application, this keeps the application code more reusable and we can manage the networking without having to redeploy parts of the application.</p>
<p>With a service mesh networking is just part of it, and its security also. Just because all the pods live on the same cluster should they all be able to contact and have access to every pod on the cluster? No! Would you have every VM on your network have access to every other VM? I hope not, and the same should apply to your Kubernetes deployments.</p>
<h2 id="heading-additional-benefits-to-using-a-service-mesh-and-decoupling-network-configuration">Additional benefits to using a service mesh and decoupling network configuration</h2>
<ul>
<li><p><strong>Traffic management</strong></p>
<p>  Istio allows users to control traffic flows and API calls between services by configuring rules and routing traffic. We can be quite granular in how and what our pods and applications can interact with on the network.</p>
</li>
<li><p><strong>Security</strong></p>
<p>  Istio provides a backbone for communications and manages security controls. Adding an additional layer of security and adding to the principle of least privilege.</p>
</li>
<li><p><strong>Observability</strong></p>
<p>  Istio can extract telemetry data from proxy containers and send it to a monitoring dashboard. As we can see with a Grafana dashboard we now get a huge amount of observablity into how our network is performing, more data gives us more insight allowing us to make not just better technical decisions and spot trends but also business decisions and trends too.</p>
</li>
</ul>
<p>You can do some other neat things like fault injection like adding delays to test the resiliency of your configuration and traffic shaping.</p>
<p>These are more advanced features I’ll be looking at in the near future but they really add to the feature set to why you would look to add Istio service mesh to your infrastructure.</p>
<p>Hopefully, you can start to see why you introduce a service mesh. Having the network configuration decoupled from application logic, enabling fine-grained control and observability.</p>
<h2 id="heading-sounds-good-how-do-i-get-started">Sounds good! How do I get started?</h2>
<p>Now we have an idea of why we might want to get started with a service mesh, let’s take a look at Istio. You’ll need a cluster to install it on, the rest is fairly straightforward.</p>
<p>I’m not going to regurgitate the Istio quickstart guide <a target="_blank" href="https://istio.io/latest/docs/setup/getting-started/">found here</a> you can go there and go through the guides, it’s a good showcase on getting started and the sample application gives a good account of a distributed application where having a service mesh would be worth the time, effort and added complexity.</p>
<p>You install the Istioctl command line tool, then install Istio itself on the cluster, deploy the sample application and then enable istio on the namespace you are working with.</p>
<p>The Kiali dashboard helps visualise Istio and the configured applications, you can find it in the Istio <a target="_blank" href="https://github.com/istio/istio/tree/master/samples/addons">repo in the samples/addons</a>:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735835340278/1ea990f4-b134-4e20-9a0f-d79a990ee32a.png" alt class="image--center mx-auto" /></p>
<p>Grafana dashboard for the Istio services we are monitoring via Prometheus, we now get a ton of insight and observability into how our services are performing. Grafana and Prometheus deployments and config can be found in the samples/addons part of the repo:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735838137499/5d9ca5d3-24c7-4486-9c3d-39984f609b3b.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-so-i-followed-the-guide-what-did-i-just-do-how-does-istio-istio">So I followed the guide, what did I just do? How does Istio…. Istio?</h2>
<p>What’s neat, is how Istio works. It’s essentially a proxy where the network configuration is enabled to the workloads/applications by injecting a sidecar container to the pods, here’s the <code>productpage</code> pod which shows the <code>productpage</code> container and the <code>istio-proxy</code> container as a sidecar:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735903663786/64f4e953-7835-41f7-84d0-43e5d0ec1deb.png" alt class="image--center mx-auto" /></p>
<p>Istio operates by injecting a sidecar proxy (Envoy) into each pod in your application. This proxy handles all incoming and outgoing traffic for the pod, allowing Istio to control and observe traffic without requiring changes to your application code. These sidecars work together under the direction of the Istio <strong>control plane</strong>, which manages configuration, policy enforcement, and telemetry.</p>
<p>Add a namespace label to instruct Istio to automatically inject Envoy sidecar proxies when you deploy your application later:</p>
<pre><code class="lang-bash">kubectl label namespace default istio-injection=enabled
</code></pre>
<p>For example, when you deploy the <code>productpage</code> Pod, Istio injects an Envoy proxy alongside the application container. This proxy intercepts traffic, applying Istio’s routing rules, security policies, and telemetry collection:</p>
<pre><code class="lang-bash">Pod: productpage
|-- Container: productpage
|-- Container: istio-proxy
</code></pre>
<p>This architecture ensures that network configuration remains decoupled from application logic, enabling fine-grained control and observability.</p>
<p>The first time I tried to work with Istio, nothing was happening…. I forgot to label the namespace I was working in! :face_palm</p>
<p>also, keep an eye out for any existing resource quotas and/or network policies in place that might stop Istio from working or behaving as expected.</p>
<pre><code class="lang-bash">k label namespace dev-three-tier istio-injection=enabled
k -n dev-three-tier scale deployments backend --replicas 0
k -n dev-three-tier scale deployments frontend --replicas 0
k -n dev-three-tier get deployments.apps
k -n dev-three-tier scale deployments backend --replicas 2
k -n dev-three-tier scale deployments frontend --replicas 1
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735905524903/98cd7ff8-9a75-4b86-8afa-6aec88ff280f.png" alt class="image--center mx-auto" /></p>
<p>Here I’ve just added the label to another namespace in my cluster dev-three-tier (haven’t added the db yet!) which is a very simple frontend backend and soon, DB. I then scaled down and scaled back the replicas in the deployment and Isto has now injected the sidecar proxy container into each of the pods.</p>
<p>Send some traffic to the frontend. <code>while :; do curl -s</code> <a target="_blank" href="http://192.168.1.43"><code>http://192.168.1.43</code></a> <code>; sleep 1; done</code></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735905608113/85539b9c-5776-4e1f-b863-ea92ea936eb1.png" alt class="image--center mx-auto" /></p>
<p>I haven’t added a gateway or any rules yet but it’s a very simple app and Istio is probably overkill, I just wanted to show how you can add Istio to other workloads in your cluster quite simply.</p>
<p>I’m still learning about Istio but really enjoying seeing some real world context and use case, I wanted to use this article to go through how to get started, what I found useful and write some things down in the hope that this helps anyone, maybe demystify and explain in words and context that is more approachable.</p>
<p>As always, drop a comment if I got this wrong or missed something or if there is something else I should check out, I’m planning on checking out Linkerd in more detail but if anything else, I’m all ears!</p>
<h2 id="heading-links">Links:</h2>
<p><a target="_blank" href="https://github.com/istio/istio/tree/master/samples/addons">Istio GitHub repo</a></p>
<p><a target="_blank" href="https://istio.io/latest/docs/examples/microservices-istio/">Learning Microservices with Kubernetes and Istio</a></p>
<p><a target="_blank" href="https://istio.io/latest/docs/setup/getting-started/">Istio Sidecar mode getting started</a></p>
]]></content:encoded></item><item><title><![CDATA[Automating GitHub releases with Release Please]]></title><description><![CDATA[I was recently introduced to a GitHub Action called release-please which I’ve enjoyed using and seeing in action, so naturally to get a better understanding of it I’m writing this short blog post! Hopefully, someone finds it useful.
So a quick overvi...]]></description><link>https://ferrishall.dev/automating-github-releases-with-release-please</link><guid isPermaLink="true">https://ferrishall.dev/automating-github-releases-with-release-please</guid><category><![CDATA[release-please]]></category><category><![CDATA[GitHub]]></category><category><![CDATA[Git]]></category><category><![CDATA[software development]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[release management]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[cloud native]]></category><category><![CDATA[gitops]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Platform Engineering ]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Tue, 17 Dec 2024 15:36:52 GMT</pubDate><content:encoded><![CDATA[<p>I was recently introduced to a GitHub Action called <a target="_blank" href="https://github.com/googleapis/release-please">release-please</a> which I’ve enjoyed using and seeing in action, so naturally to get a better understanding of it I’m writing this short blog post! Hopefully, someone finds it useful.</p>
<p>So a quick overview of Release Please and why you might need it in your life.</p>
<p>This really does improve the workflow from my previous write up on <a target="_blank" href="https://ferrishall.dev/gitops-with-github-actions-and-argo-cd">GitOps with GitHub Actions</a> where to trigger my GitHub Action to build and push my Docker container image I created a release which had a sever tag, and this worked fine.</p>
<p>But I’m manually created that release, couldn’t there be a better way to release without me having to type up everything about what happened to make that release, like an automated changelog? This is where automated releases with Release Please come in!</p>
<p>I’ve taken the same very simple Python application that is in a Docker container it builds and gets pushed to my personal private Docker hub repo.</p>
<p>I have also added another GitHub Action Workflow:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">on:</span>
  <span class="hljs-attr">push:</span>
    <span class="hljs-attr">branches:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">main</span>

<span class="hljs-attr">permissions:</span>
  <span class="hljs-attr">contents:</span> <span class="hljs-string">write</span>
  <span class="hljs-attr">pull-requests:</span> <span class="hljs-string">write</span>

<span class="hljs-attr">name:</span> <span class="hljs-string">release-please</span>
<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">release-please:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">googleapis/release-please-action@v4</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">token:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.MY_RELEASE_PLEASE_DEMO_TOKEN</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">release-type:</span> <span class="hljs-string">simple</span>
</code></pre>
<p><code>.github/workflows/release-please.yaml</code> In its simplest, tells my repo on a Push top the main branch create a release PR, there are some caveats <a target="_blank" href="https://github.com/googleapis/release-please?tab=readme-ov-file#release-please-bot-does-not-create-a-release-pr-why">here</a>. I have also created a GitHub Token and added it as a repo secret <code>MY_RELEASE_PLEASE_DEMO_TOKEN</code> so it can make changes to my repo.</p>
<p>So how does Release Please know what or when to create a release?! Essentially by using <a target="_blank" href="https://www.conventionalcommits.org/en/v1.0.0/"><strong>conventional commits</strong></a> anything that is deemed a releasable unit is a commit to the branch with one of the following prefixes: "feat", "fix", and "deps".</p>
<p>So my PR which I have reviewed and merged into the main branch, contained a “feature” or “feat” prefixed commit as seen by Release Please. Release Please then creates a PR based on what we’ve told it is a releasable feature, in this case, a “feat” or feature.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734440861923/e047bdde-5d35-453e-9555-efccb912a15d.png" alt class="image--center mx-auto" /></p>
<p>What’s really neat is it also keeps a running change log, giving you lots of handy info about what’s being brought into the release and leaves a nice audit trail of your releases….. Automatically!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734440932511/a08a35cd-0109-46bd-9445-3896e07adb4d.png" alt class="image--center mx-auto" /></p>
<p>Even if the Release Please PR is open and you publish more commits to the branch it will edit the Pr and pick them up.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734445827372/53c38fd2-4e24-4ea9-89a1-a19b8379319f.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734445925588/05e147fd-bf7b-4d88-8f35-f24da8c61bd6.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734446039092/2291a3a8-1e92-49bc-b2ef-4f0fc5389787.png" alt class="image--center mx-auto" /></p>
<p>The release-please Action has triggered and is now running to create the release, my new Release Action then triggers because it is configured to build the docker image when a new tag release has been created, which has been taken care of by Release Please!</p>
<p>I’ve got a little more tinkering to get the deployment part (creating the PR to my environment-infrastructure repo that Argo CD deploys from…) but hopefully, you get the picture that this is a great tool to automate your releases!</p>
<h2 id="heading-conventional-commits">Conventional Commits</h2>
<p>Woah Woah Woah?! So you might be thinking “You’ve gone from this is just easy releases to now conventional commits?! What’s this all about?!”</p>
<p>I know, it feels like more things to learn for the thing you were trying to learn, but conventional commits wraps this all up and to make it work, it essentially ensures you follow a preferred standard and pattern of commit messages, we’ve all been there and wished our colleagues or past selves took more time to create a commit message “HerpDerp“ may have been applicable at the time but future you will thank you for conventional commits, you can read up about them <a target="_blank" href="https://www.conventionalcommits.org/en/v1.0.0/">here</a> Its also how Release Please knows what you want to release and what doesn’t actually need a release (Changes to README.md’s ama right?!)</p>
<p>There is a VSCode IDE plugin in for it so it just works or even better, like me you don’t like too much or any M$ products it also works with VSCodium (A nice non M$ non steal all your data alternative IDE).</p>
<p>Also like me, you don’t like to use Git using the IDE and prefer the terminal CLI there is also <a target="_blank" href="https://commitizen-tools.github.io/commitizen/">Commitizen</a> which helps with keeping your conventional commits….. well, conventional!</p>
<p>This was introduced to me by a colleague and I have just been having fun learning a bit more about it and finding a way it can work with my existing development.</p>
<h2 id="heading-whats-next-or-is-that-it">Whats next? Or is that it?</h2>
<p>This is a pretty simplified implementation of Release Please, which shows how easy it is to get up and running.</p>
<p>You can be more customised and granular with the configuration of Release Please which is something I’m planning on tinkering with and getting working with the deployment repo I have.</p>
<p>For the next part, I am looking to improve this by using a <a target="_blank" href="https://github.com/googleapis/release-please/blob/main/docs/manifest-releaser.md">release-please-config.json</a> to manage my own versioning and package contents.</p>
<p>Hopefully for anyone getting started, this will be helpful! If you have any thoughts or comments or something I have missed or should even try differently, please hit me up on the comments. I’m always happy to keep learning and hear different opinions and methods :-)</p>
]]></content:encoded></item><item><title><![CDATA[GitOps with GitHub Actions and Argo CD]]></title><description><![CDATA[Those of you who read my previous post where I tried out Flux CD and Argo CD will know I have been tinkering with automated deployments to my homelab Kubernetes cluster/s.
Recently I picked up a short quickstart e-book on GitOps which I thought looke...]]></description><link>https://ferrishall.dev/gitops-with-github-actions-and-argo-cd</link><guid isPermaLink="true">https://ferrishall.dev/gitops-with-github-actions-and-argo-cd</guid><category><![CDATA[GitHub]]></category><category><![CDATA[github-actions]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[gitops]]></category><category><![CDATA[ArgoCD]]></category><category><![CDATA[Devops]]></category><category><![CDATA[cloud native]]></category><category><![CDATA[Cloud Computing]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Tue, 22 Oct 2024 09:19:13 GMT</pubDate><content:encoded><![CDATA[<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729588448497/6c0763f9-0cba-4e94-bf15-2337c4c6be50.webp" alt class="image--center mx-auto" /></p>
<p>Those of you who read my previous post where I tried out <a target="_blank" href="https://hashnode.com/post/clyxgps4a000409mi4m29fmoz">Flux CD and Argo CD</a> will know I have been tinkering with automated deployments to my homelab Kubernetes cluster/s.</p>
<p>Recently I picked up a short <a target="_blank" href="https://leanpub.com/gitops">quickstart e-book on GitOps</a> which I thought looked good for a refresh, reminder and hands-on practice about some opinionated methods to integrate and deploy code from laptop to environment, or in my case homelab. (I have no affiliation with the author or site, I just think it was a nice source of inspiration with some nice hands-on demos to try) And I think it’s always good to read about other opinions or working methods.</p>
<p>So this blog post is me putting into practice and using this book and its examples as inspiration for building an application and deploying it onto Kubernetes.</p>
<p>Now, I‘ve built my fair share of Docker images over the years and GitOps as a concept isn’t new to me but sometimes it’s just nice to have that inspiration guided to you, it’s been a while so it was good practice for me to get my hands on GitHub actions and deploying to my cluster, though I did also use Ago CD as well as Flux CD while going through the demo code.</p>
<p>Personally, I have no real preference. The book takes you through using Flux CD as the deployment option, but I just wanted to try both.</p>
<p>I’ve used Cloud Build with Terraform quite a bit, so I wanted to get my hands on building container images with GitHub Actions, as it has been on my to-do list to get more acquainted with for a while.</p>
<p>This book gives a nice example and inspiration on how to practice GitOps. Sometimes half the battle is finding the inspiration on what to code or deploy for practice, you can easily spend more time getting set up and figuring out what to deploy and code, that you forget why you were doing it in the first place.</p>
<p>Well, it happens to me sometimes anyway!</p>
<p><em>Spoiler!</em> What I really liked was the automation of updating the Kubernetes manifest file with the new Docker container image version tag that had just been created.</p>
<h2 id="heading-the-set-up">The set up</h2>
<p>I wanted to build a simple container that ran a hello world type thing that I could modify, develop and deploy to a cluster automatically.</p>
<p>I already had an application set up using Flux CD <a target="_blank" href="https://hashnode.com/post/clyxgps4a000409mi4m29fmoz">from before</a> so I thought I would change it up and use Argo CD (I’d just re-deployed it to a different cluster) instead of Flux CD.</p>
<p>I created a Docker Hub private repo for the container image to live which I creatively called…… example-application.</p>
<p>I created a GitHub repo for the application “example-application“ I also added some repo secrets so GitHub can access the Docker repo, username and token will be referenced in the GitHub Action as <code>secrets.REGISTRY_USER</code>.and <code>secrets.REGISTRY_TOKEN</code> you can then add the value with your own <a target="_blank" href="https://docs.docker.com/security/for-developers/access-tokens/">Docker Hub username and token</a> (I already have a secret in my Kubernetes cluster for <a target="_blank" href="https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/">pulling images from my private repos</a>). I added one last repo secret, a <a target="_blank" href="https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens">GitHub personal access token</a> for accessing the next GitHub repo.</p>
<p>The second GitHub repo for the Kubernetes manifest which is named “example-environment“has a repo secret for the GitHub personal access token so it can create Pull Requests and a <a target="_blank" href="https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables">Github repo variable</a> <code>DOCKER_HUB_IMAGE</code> with a value for the name of the image I want to use in the manifest, so I don’t have to hardcode anything in the GitHub Actions, making them more reusable.</p>
<h2 id="heading-the-application">The “Application”</h2>
<p>So my super noddy “Hello Argo CD!” will be written in Python and use the Flask web framework as it’s my most familiar language (I am no software developer, I dabble at best!).</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> flask <span class="hljs-keyword">import</span> Flask

app = Flask(__name__)

<span class="hljs-meta">@app.route('/')</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">index</span>():</span>
    <span class="hljs-keyword">return</span> <span class="hljs-string">'Hello Argo CD v1.0!'</span>

app.run(host=<span class="hljs-string">'0.0.0.0'</span>, port=<span class="hljs-number">8080</span>)
</code></pre>
<p>Please don’t all point and laugh, I already added the disclaimer that I am no software developer….</p>
<p>I created a public Docker Hub repo called example-application, there’s nothing special or personal about it so a personal repo is OK for this demo.</p>
<p>I created a GitHub repo in which I store my application code for the application above.</p>
<p>Here’s the Dockerfile, which adds instructions on how to build and run the container:</p>
<pre><code class="lang-dockerfile"><span class="hljs-keyword">FROM</span> python:<span class="hljs-number">3.8</span>-alpine
<span class="hljs-keyword">WORKDIR</span><span class="bash"> /py-app</span>
<span class="hljs-keyword">COPY</span><span class="bash"> . .</span>
<span class="hljs-keyword">RUN</span><span class="bash"> pip3 install flask</span>
<span class="hljs-keyword">EXPOSE</span> <span class="hljs-number">8080</span>
<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"python3"</span>, <span class="hljs-string">"main.py"</span>]</span>
</code></pre>
<p>I’m keeping it simple, using the <code>python 3.8-alpine</code> base image, declaring a directory to work in <code>/py-app</code> telling Docker to copy everything from the local directory to the Docker container work directory, I could have been selective here, should have been but for speed and simplicity grab everything.</p>
<p><code>RUN</code> is telling Docker to run a command in the build container pip3 install flask in this case. <code>EXPOSE</code> is more of a label reminding you this container when it’s running will need port 8080 to be exposed in order for you to reach the web server that this will be running thanks to the final instruction, the command the container will run on execution or runtime.</p>
<h2 id="heading-lights-camera-github-actions">Lights, Camera…… GitHub Actions!</h2>
<p>Now I need to create some GitHub actions to automate building the Docker image, tagging the image, then pushing to my Docker Hub repo.</p>
<p>The book demo code is a really good place to start:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">new</span> <span class="hljs-string">Release</span>
<span class="hljs-attr">on:</span>
  <span class="hljs-attr">push:</span>
    <span class="hljs-attr">tags:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">v*</span>

<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">build:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Checkout</span> <span class="hljs-string">code</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v2</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Build</span> <span class="hljs-string">and</span> <span class="hljs-string">Push</span> <span class="hljs-string">Container</span> <span class="hljs-string">Image</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">docker/build-push-action@v1</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">username:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.REGISTRY_USER</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">password:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.REGISTRY_TOKEN</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">dockerfile:</span> <span class="hljs-string">Dockerfile</span>
          <span class="hljs-attr">repository:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.REGISTRY_USER</span> <span class="hljs-string">}}/${{</span> <span class="hljs-string">github.event.repository.name</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">tag_with_ref:</span> <span class="hljs-literal">true</span>
          <span class="hljs-attr">tag_with_sha:</span> <span class="hljs-literal">false</span>

  <span class="hljs-attr">release:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">needs:</span> <span class="hljs-string">build</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Add</span> <span class="hljs-string">TAG_NAME</span> <span class="hljs-string">env</span> <span class="hljs-string">property</span>
        <span class="hljs-attr">run:</span>  <span class="hljs-string">echo</span> <span class="hljs-string">"TAG_NAME=`echo ${GITHUB_REF#refs/tags/}`"</span> <span class="hljs-string">&gt;&gt;</span> <span class="hljs-string">$GITHUB_ENV</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Open</span> <span class="hljs-string">PR</span> <span class="hljs-string">in</span> <span class="hljs-string">Environment</span> <span class="hljs-string">Repository</span> <span class="hljs-string">for</span> <span class="hljs-string">new</span> <span class="hljs-string">App</span> <span class="hljs-string">Version</span>
      <span class="hljs-comment"># This is the GitHib Action that will be in the </span>
      <span class="hljs-comment"># example-environment GitHub repo which is mentioned ENV_REPO</span>
        <span class="hljs-attr">env:</span>
          <span class="hljs-attr">ENV_REPO:</span> <span class="hljs-string">${{</span> <span class="hljs-string">github.event.repository.owner.name</span> <span class="hljs-string">}}/example-environment</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">benc-uk/workflow-dispatch@v1.2</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">workflow:</span> <span class="hljs-string">ApplicationVersion.yaml</span> 
          <span class="hljs-attr">token:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.PERSONAL_ACCESS_TOKEN</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">inputs:</span> <span class="hljs-string">'{"tag_name": "$<span class="hljs-template-variable">{{ env.TAG_NAME }}</span>", "app_repo": "$<span class="hljs-template-variable">{{ github.event.repository.name }}</span>", "image": "$<span class="hljs-template-variable">{{ github.event.repository.full_name }}</span>:$<span class="hljs-template-variable">{{ env.TAG_NAME }}</span>"}'</span>
          <span class="hljs-attr">ref:</span> <span class="hljs-string">refs/heads/main</span>
          <span class="hljs-attr">repo:</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.ENV_REPO</span> <span class="hljs-string">}}</span>
</code></pre>
<p>I had to update some bits and workflow versions other than that, the code the book gives you works great and with a bit of Google-Fu you will find your way around different actions in no time.</p>
<h3 id="heading-build-and-push-image">Build and Push Image</h3>
<p>With the “new Release” Action, we have a build job, the first step <code>name: Checkout code</code> checks out the code from the GitHub repo and the next step <code>name: Build and Push Container Image</code> does just that, it builds according to the Dockerfile that I wrote earlier and we’re giving the Action and pushing the image when it’s built using the secrets that I created on the repo earlier, as we absolutely do not want to be committing any sensitive or secret data into a Git repo! Private or not, make sure you don’t!</p>
<p>Then on to the next Job “release“ where we create a release for the artifact or, deploying the application/container image but to do that I am creating a Pull request to another GitHub Repo!</p>
<p>This is the really cool part, we could just manually do this of course but in a real world use case we would want to deploy this to a dev or test environment, automatically because humans tend to forget things when we’re busy or forget steps etc so we can predictably, accurately and ensure this happens timely and correctly by automating where we can.</p>
<h3 id="heading-create-pull-request">Create Pull request</h3>
<p>Hopping over to the example-environment GitHub repo and we’ll look at the code I’m using for the Kubernetes manifest yaml file, nothing too advanced here:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">example-application</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">example-application</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">example-application</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">image:</span> <span class="hljs-string">docker-repo-name/example-application:v1.16.5</span>
        <span class="hljs-attr">name:</span> <span class="hljs-string">example-application</span>
        <span class="hljs-attr">ports:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">containerPort:</span> <span class="hljs-number">8080</span>
      <span class="hljs-attr">imagePullSecrets:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">dockerhub</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">example-application</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">example-application</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">LoadBalancer</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">example-application</span>
  <span class="hljs-attr">ports:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-number">8080</span>
</code></pre>
<p>But what else we’ll add here is the GitHub Action which is being called from the GitHub Action in the example-application repo, stay with me on this!</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">New</span> <span class="hljs-string">Application</span> <span class="hljs-string">Version</span>

<span class="hljs-attr">on:</span>
  <span class="hljs-attr">workflow_dispatch:</span>
    <span class="hljs-attr">inputs:</span>
      <span class="hljs-attr">tag_name:</span>
        <span class="hljs-attr">required:</span> <span class="hljs-literal">true</span>
      <span class="hljs-attr">app_repo:</span>
        <span class="hljs-attr">required:</span> <span class="hljs-literal">true</span>
      <span class="hljs-attr">image:</span>
        <span class="hljs-attr">required:</span> <span class="hljs-literal">true</span>

<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">update-image-tag:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Wrap</span> <span class="hljs-string">Input</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">|
          echo "APP_REPO=${{ github.event.inputs.app_repo }}" &gt;&gt; $GITHUB_ENV
          echo "TAG_NAME=${{ github.event.inputs.tag_name }}" &gt;&gt; $GITHUB_ENV
          echo "IMAGE=${{ vars.DOCKER_HUB_IMAGE }}" &gt;&gt; $GITHUB_ENV
          echo "DEPLOY_FILE_PATH=applications/${{ github.event.inputs.app_repo }}/deployment.yaml" &gt;&gt; $GITHUB_ENV
</span>      <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v2</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/setup-kubectl@v1</span>
        <span class="hljs-attr">id:</span> <span class="hljs-string">install</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">patch</span> <span class="hljs-string">deployment</span> <span class="hljs-string">manifest</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">kubectl</span> <span class="hljs-string">patch</span> <span class="hljs-string">--filename=${{</span> <span class="hljs-string">env.DEPLOY_FILE_PATH</span> <span class="hljs-string">}}</span> <span class="hljs-string">--patch='{"spec":{"template":{"spec":{"containers":[{"name":"${{</span> <span class="hljs-string">env.APP_REPO</span> <span class="hljs-string">}}","image":"${{</span> <span class="hljs-string">env.IMAGE</span> <span class="hljs-string">}}:${{</span> <span class="hljs-string">env.TAG_NAME</span> <span class="hljs-string">}}"}]}}}}'</span> <span class="hljs-string">--local=true</span> <span class="hljs-string">-o</span> <span class="hljs-string">yaml</span> <span class="hljs-string">&gt;</span> <span class="hljs-string">tmp.yaml</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">commit</span> <span class="hljs-string">change</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">|
            git config user.name ${{ github.actor }}
            git config user.email '${{ github.actor }}@users.noreply.github.com'
            rm -f ${{ env.DEPLOY_FILE_PATH }}
            mv tmp.yaml ${{ env.DEPLOY_FILE_PATH }}
            git add ${{ env.DEPLOY_FILE_PATH }}
            git diff-index --quiet HEAD || git commit -m "Set ${{ env.APP_REPO }} to version ${{ env.TAG_NAME }}"
</span>      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Create</span> <span class="hljs-string">Pull</span> <span class="hljs-string">Request</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">peter-evans/create-pull-request@v3</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">token:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.PERSONAL_ACCESS_TOKEN</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">commit-message:</span> <span class="hljs-string">Update</span> <span class="hljs-string">report</span>
          <span class="hljs-attr">committer:</span> <span class="hljs-string">GitHub</span> <span class="hljs-string">&lt;noreply@github.com&gt;</span>
          <span class="hljs-attr">author:</span> <span class="hljs-string">${{</span> <span class="hljs-string">github.actor</span> <span class="hljs-string">}}</span> <span class="hljs-string">&lt;${{</span> <span class="hljs-string">github.actor</span> <span class="hljs-string">}}@users.noreply.github.com&gt;</span>
          <span class="hljs-attr">signoff:</span> <span class="hljs-literal">false</span>
          <span class="hljs-attr">branch:</span> <span class="hljs-string">new_release_${{</span> <span class="hljs-string">env.APP_REPO</span> <span class="hljs-string">}}-${{</span> <span class="hljs-string">env.TAG_NAME</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">title:</span> <span class="hljs-string">'Set $<span class="hljs-template-variable">{{ env.APP_REPO }}</span> to version $<span class="hljs-template-variable">{{ env.TAG_NAME }}</span>'</span>
          <span class="hljs-attr">body:</span> <span class="hljs-string">|
            This PR was automatically created.
            Please review and merge to deploy.</span>
</code></pre>
<p>Once the Docker build and push steps has ran on the example-application repo GitHub Action, the “Open PR in Environment Repository for new App Version“ step in the release job is then creating a PR in this example-environment repo using the image and tag just built and pushed to the Docker Hub Repo to update the Kubernetes manifest file with the new image tag.</p>
<p>The step <code>name: commit change</code> then has the commands to add and commit the <code>tmp.yaml</code> which has the new image tag code, essentially updating the example-environment GitHub repo on our behalf.</p>
<p>The process should go as follows:</p>
<p>Create/update the Python application and Dockerfile, push to the main branch and create a release with a new tag, v1.16.6 for example:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729261251841/79fac3eb-2afa-4127-b7d3-1ad42b9e1db1.png" alt class="image--center mx-auto" /></p>
<p>The release job will have the image and tag to update the Kubernetes manifest file and use that to create the PR:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729261342251/28b9f38c-e416-4859-bfa9-a8c2e707369a.png" alt class="image--center mx-auto" /></p>
<p>Let’s see what this PR is updating:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729261526049/325de7af-8795-4ea2-a0f8-07911b78dd74.png" alt class="image--center mx-auto" /></p>
<p>Looks good to me, I can review and merge that to the main branch now updating my Kubernetes manifest file without having to use a terminal or edit the manifest myself!</p>
<h3 id="heading-show-me-the-pods">Show me the pods!</h3>
<p>So we went to all that trouble, where is my application? Where are the pods?! I didn’t go through the Argo CD set up, its pretty straight forward, I’ll add a link at the end but that’s where the CD tools come into there own!</p>
<p>I added the example-environment repo as an application in Argo CD, as we can see on the right</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729264209531/ce9d52ae-3cfe-456b-851d-4f7489271388.png" alt class="image--center mx-auto" /></p>
<p>Clicking into the application</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729499443815/5bb16076-ded6-4b13-b5b0-b4b05af19be3.png" alt class="image--center mx-auto" /></p>
<p>I can also look at the pod events by clicking on the pod itself</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729499485434/96b569e9-4840-49e8-86bd-870214fbf2fb.png" alt class="image--center mx-auto" /></p>
<p>I can jump onto my terminal and run <code>kubctl get pods</code></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729264670800/37611616-83b1-4e5c-9b44-323c20cf3ec8.png" alt class="image--center mx-auto" /></p>
<p>and see the pod is running, and go to my internal IP address which is being exposed by my cluster</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729264732909/c5867aad-fe3f-43a2-9810-c7a1b8e87afb.png" alt class="image--center mx-auto" /></p>
<p>there’s my application! Built, pushed to a repo and deployed all using Git as a central source of truth without having to manually do anything!</p>
<h3 id="heading-summary-and-conclusion">Summary and conclusion</h3>
<p>Hopefully, you can see the benefits and reasons for going through all this trouble.</p>
<p>Modify your code and go through the release process a few times. You’ll start to see how your release cadence and velocity increase, as well as the build and release process itself becoming predictable and standardised.</p>
<p>This to me, is the epitome of GitOps, having a single source of truth and a single standard and effective method of building software and deployment of an application.</p>
<p>With Argo CD in place, any changes you make to the example-environment GitHub repo are then reconciled by Argo CD and reflected in the Kubernetes Cluster, automating the deployments.</p>
<p>Flux CD works great for the Continuous deployment as well, so it’s worth trying both out.</p>
<p>This has led me down a rabbit hole where to take this forward I’m planning on looking into using the <a target="_blank" href="https://github.com/google-github-actions/release-please-action">Release-Please GitHub Action</a>, Release Please automates CHANGELOG generation, the creation of GitHub releases, and version bumps for your projects.</p>
<p>I hope this has helped anyone with getting started with GitOps or even just serves as some inspiration on what to try or practice next as much as the <a target="_blank" href="https://leanpub.com/gitops">quickstart free e-book on GitOps</a> did for me, add some changes to the process or code as I have or follow it closely, as long as you get some practice in, it will become permanent.</p>
<p>Thanks for reading and as always if you have any comments, questions, feedback or spotted any mistakes please get in touch!</p>
]]></content:encoded></item><item><title><![CDATA[Flux CD Vs Argo CD]]></title><description><![CDATA[I'm trying out both to find the winner of Continuous Delivery to a Kubernetes Cluster.
Disclaimer: There isn't a winner only you, you're the winner for choosing to automate your deployments (Terribly diplomatic and lame of me I know, but there you go...]]></description><link>https://ferrishall.dev/flux-cd-vs-argo-cd</link><guid isPermaLink="true">https://ferrishall.dev/flux-cd-vs-argo-cd</guid><category><![CDATA[ArgoCD]]></category><category><![CDATA[cicd]]></category><category><![CDATA[continuous deployment]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[containers]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[Software Deployment]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[cloud native]]></category><category><![CDATA[Cloud Computing]]></category><category><![CDATA[FluxcD]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Mon, 22 Jul 2024 20:51:45 GMT</pubDate><content:encoded><![CDATA[<p>I'm trying out both to find the winner of Continuous Delivery to a Kubernetes Cluster.</p>
<p><strong>Disclaimer: There isn't a winner only you, you're the winner for choosing to automate your deployments</strong> (Terribly diplomatic and lame of me I know, but there you go).</p>
<p>So You've got a cluster created the hard way or the GKE way, either way, you've got a cluster, great!</p>
<p>Now.... what do you do with it and how do you get applications in there?!</p>
<p><img src="https://lh6.googleusercontent.com/proxy/Ozr5hOaFPLCyCYmqewmnmF1jN4Zwahz4XGCMuIPPkf5LNmi6Pw9sLyz-1DdlGSER9XS663XZSQCOkf-fV0Q2KoUUNEaS2MG6x2tT6plam-Akmh_sKl4UED-u9PWA8Zvp1b2H90F_-PuLBcYN2E5ReIlwzn1L" alt="The files are in the computer... - Zoolander - quickmeme" class="image--center mx-auto" /></p>
<p>By now, you're probably familiar with some imperative commands, you've probably even been typing up some Kubernetes manifest yaml files and applying them like crazy, all good stuff but you might be hankering for more!</p>
<p>You might have automated your cluster with some VMs with Ansible (unlikely, fair play if you have) Or spinning up some managed cloud clusters with Terraform (more likely) but now your thinking wouldn't be nice to automate the <code>k apply -f manifest.yaml</code> commands you've been running.</p>
<h2 id="heading-gitops">GitOps</h2>
<p>The real reason we're here is for GitOps, we want to push our application changes to a Git repo and have those changes realised into real life onto our cluster, it's predictable and repeatable, we have an audit of the changes being made and we can easily roll back in case of any issues. There's a whole heap of opinions and options on GitOps and its not even for just application configuration, we're talking infra, Docker building etc. But that's for another day.....</p>
<p>GitHub actions work well, Google Cloud Build and Google Cloud Deploy all work great too but I've been looking at Flux CD and <a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/">Argo CD</a> more so as they are installed on a cluster themselves and are a self-hosted option if you will (No, not Jenkins. Not even if there's a fire).</p>
<p>I've installed Flux CD and Argo CD on my VM based cluster following my guide <a target="_blank" href="https://ferrishall.dev/how-to-set-up-a-kubernetes-cluster-for-studying-and-exam-preparation">here</a> and here's what I thought.</p>
<p>Installation wise, they can both be installed very quickly to get you up and running.</p>
<p>Argo CD can be installed via a <a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/operator-manual/installation/">Helm chart or just a kubectl apply -f</a> and with Flux CD, I installed the command line tool and <a target="_blank" href="https://fluxcd.io/flux/installation/bootstrap/github/">bootstrapped a Github repo</a> I used for testing.</p>
<h2 id="heading-argo-cd">Argo CD</h2>
<p>First impressions, with Argo CD, I really liked the console UI. Installation is pretty simple you can install via the Helm chart or just apply the manifest file via the Argo CD repo/documentation.</p>
<p>And <a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/operator-manual/architecture/">here's what you get</a> essentially an API server, repository server and the application controller which is the watch loop that watches the applications deployed with Argo CD.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721680368577/56dc89d2-a6c6-46b4-b8b5-f00aeec1d39d.png" alt class="image--center mx-auto" /></p>
<p>This is a <a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/getting_started/#creating-apps-via-cli">demo application</a> from the Argo CD examples repo, that can be added via the console or via the <a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/cli_installation/">argocd CLI tool</a>.</p>
<p>The application dashboard console paints a really nice visual of whats actually happening in your cluster come deployment time.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721677513303/de323930-b8ca-47c4-9219-185d901153a7.png" alt class="image--center mx-auto" /></p>
<p>You can really easily just grab a Helm chart repo URL and add an application via a Helm chart using the dashboard, a nice and easy to get going.</p>
<p>You can also connect to your GitHub repos, make your changes, commit and push Argo CD will then reconcile what it sees in the repo and what it knows it has deployed, there are also options where you can choose for Argo to "sync" automatically or not, if you wanted some manual intervention, it's always a nice option.</p>
<p>The GitOps option, connecting to a repo and watching for changes, now this is what makes this worthwhile.</p>
<p>Argo updating my deployment by adding a 3rd pod.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721679168523/65e659b0-6aa0-4aa2-a3e5-df39e25c65af.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721679206913/b86546b2-1300-4c1e-905b-e6b010996e1b.png" alt class="image--center mx-auto" /></p>
<p>Yeah, yeah I pushed straight to master, so shoot me :shrug</p>
<p>I really like Argo CD, I've got it pointing at 2 clusters at the moment for deploying applications to, some options like adding another cluster to Argo CD are CLI command only which works fine and has good documentation on finding your way around and who doesn't love a dashboard with stuff spinning up and down?!</p>
<h2 id="heading-flux-cd">Flux CD</h2>
<p>Flux is a CLI only tool, I believe there are some <a target="_blank" href="https://fluxcd.io/blog/2024/02/introducing-capacitor/">UI dashboard tools</a> but I haven't tried any yet.</p>
<p>I found Flux CD easy enough to set up, and <a target="_blank" href="https://fluxcd.io/flux/installation/#install-the-flux-cli">install the Flux CD CLI tool</a>, bootstrap a Git repo, I connected <a target="_blank" href="https://fluxcd.io/flux/installation/bootstrap/github/#github-personal-account">GitHub using the documentation</a> I was also able to use a private repo and a personal account, no need for any enterprise GitHub accounts etc.</p>
<p>Flux CD then installs the controllers on your cluster for use with Helm, Kustomize and Git repos.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721678223174/49b5c561-b33b-472d-a819-c773d7d50a6e.png" alt class="image--center mx-auto" /></p>
<p>I <a target="_blank" href="https://fluxcd.io/flux/installation/bootstrap/github/#github-personal-account">bootstrapped a repo</a> I had already configured but it can also create the repo if it doesn't exist and then adds a flux-cluster directory in the repo, after the Flux repo bootstrapping had completed, I added my manifest files into the cluster directory, commit and push and find my application on my cluster, I've been super inspirational and used an Nginx deployment and service in the nginx-test directory to demonstrate.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721678602837/b21515b8-8bf2-4bb0-a2f5-527d71c4ca2b.png" alt class="image--center mx-auto" /></p>
<p>I then update my application, to test just scaling down from 3 to 2 pods in my super exciting nginx application, add, commit and push.</p>
<p>The Git source controller in the Flux CD pod then reconciles what it sees in the Git repo to what is deployed in real life and updates the objects in the cluster</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721678509997/890aa428-6540-42e5-93f7-74e7f74c31d6.png" alt class="image--center mx-auto" /></p>
<p>(Ignore all the node exporter pods, I've been playing with Prometheus on this new cluster...)</p>
<p>I've gone from 3 to 2 by pushing some changes to the repo! It just feels so much better automating!</p>
<h2 id="heading-my-thoughts">My Thoughts</h2>
<p>I like how lightweight Flux feels and need to try out some of the Helm chart controllers but you know, time and whatnot.</p>
<p>Flux also feels like a tool I can use and I also don't have to look after it, Argo CD could almost need a cluster of its own if this was looking like moving to production grade.</p>
<p>Either way, I'm not sure I have a favourite just yet (I know how very diplomatic of me) I don't think I've used them both long enough to compare. I've been using Argo for slightly longer and have found it nice and easy to update the values and upgrade chart versions for applications I've deployed using Helm.</p>
<p>I intend on testing and playing with Flux CD a bit more and I really like how hands off it feels.</p>
<p>Let me know what you think! Have I missed anything? Is there another tool I should check out? Feel free to comment! I'm more than happy to take on any constructive criticism and feedback!</p>
<p>Hopefully, this has at least been an interesting read, as I've really enjoyed tinkering and playing with these tools.</p>
]]></content:encoded></item><item><title><![CDATA[How to Set Up a Kubernetes Cluster for Studying and Exam Preparation]]></title><description><![CDATA[Why?!
I've personally used and tinkered with Kubernetes and GKE for the last 3-4 years and in the last couple of years it has been more in a professional context, building some GKE clusters and Cloud Composer clusters that use GKE, but it was in the ...]]></description><link>https://ferrishall.dev/how-to-set-up-a-kubernetes-cluster-for-studying-and-exam-preparation</link><guid isPermaLink="true">https://ferrishall.dev/how-to-set-up-a-kubernetes-cluster-for-studying-and-exam-preparation</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[kubernetes setup]]></category><category><![CDATA[Devops]]></category><category><![CDATA[learning]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Tue, 25 Jun 2024 10:02:29 GMT</pubDate><content:encoded><![CDATA[<p><img src="https://media.makeameme.org/created/write-a-kubernetes.jpg" alt="aaaand its gone meme" class="image--center mx-auto" /></p>
<h2 id="heading-why">Why?!</h2>
<p>I've personally used and tinkered with Kubernetes and GKE for the last 3-4 years and in the last couple of years it has been more in a professional context, building some GKE clusters and Cloud Composer clusters that use GKE, but it was in the last couple of years I really wanted to learn about Kubernetes, not just how to use it and get by.</p>
<p>Now, there are loads of ways to get a Kubernetes cluster up and running, these days it's super simple with MicroK8s, <a target="_blank" href="https://minikube.sigs.k8s.io/docs/">MiniKube</a>, and even spinning up a cluster using GKE (Google Cloud Kubernetes Engine) GCP's managed Kubernetes service.</p>
<p>I wanted to learn more about what Kubernetes is actually made of! What are the moving parts that make Kubernetes and how do they all work together?</p>
<p>There are some really good guides out in the wild, ranging from a complete DIY guide "Kubernetes the hard way" (I tried this a couple of years or so ago, it's really interesting and good fun, you can find Kelsey Hightower's repo <a target="_blank" href="https://github.com/kelseyhightower/kubernetes-the-hard-way">here</a>.) or running a <a target="_blank" href="https://microk8s.io/docs/getting-started">Raspberry Pi Microk8s cluster</a> A great way if you have the spare boards or even just one.</p>
<p>For me, time is a factor so I'll be using <a target="_blank" href="https://kubernetes.io/docs/reference/setup-tools/kubeadm/">Kubeadm</a> and some VMs to get a cluster up and running. You can think of Kubeadm as the packaged version of all the components that you need to create a cluster, it's also what the CKA &amp; CKAD exams use.</p>
<p>I'm using this page to keep track of what I found worked for me after using a couple of guides to get my test cluster up and running.</p>
<h3 id="heading-usual-this-isnt-production-ready-public-disclaimer">Usual <em>this isn't production</em>-<em>ready public disclaimer!</em></h3>
<p>This by no means is a production-ready tutorial or even an opinion on a production-ready Kubernetes cluster, just what I found worked for me in getting a development cluster working in a fairly quick time using various articles and tutorials.</p>
<p>I wanted a cluster where I could practice upgrading, breaking and fixing to help me prepare for the Certified Kubernetes Administrator exam.</p>
<h2 id="heading-pre-requisites">Pre-requisites</h2>
<p>First, I created 3 VMs for my cluster, 1 control plane or master node, and 3 nodes for workloads. I created the VMs on my home lab hypervisor server, using ProxMox (Maybe I'll do a quick article on my home lab setup?!).</p>
<p>My VMs are installed with Ubuntu 22.04 OS and will prep and configure all with the following commands.</p>
<h2 id="heading-steps-for-all-vms">Steps for all VMs</h2>
<h3 id="heading-swap">Swap</h3>
<p>First, we disable swap.</p>
<pre><code class="lang-plaintext">sudo swapoff -a
</code></pre>
<p>You'll need to edit the <code>/etc/fstab</code> swap or in my case swamp.img entry, just adding a "#" to comment out the line that mounts the swap should be fine. This will stop swap re-enabling at VM boot.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719307339431/49dd4066-7e99-4148-9094-ea13915228f8.png" alt class="image--center mx-auto" /></p>
<p>But wait, why does swap need to be disabled?!</p>
<p>Swap needs to be disabled for performance reasons and part of the scheduling that kube-scheduling performs to "score" and "pick' a node for a pod or workload, as they could potentially utilise the swap memory of a node..... or something like that.</p>
<p>Kubelet also failed to start and run correctly on my VMs until I disabled swap.</p>
<p>You'll need to do this for all VM control plane/s and worker nodes.</p>
<p>That being said swap support is being worked on apparently, I found some interesting info <a target="_blank" href="https://github.com/kubernetes/kubernetes/issues/53533">here</a> about it, if you're interested.</p>
<h3 id="heading-container-runtime">Container runtime</h3>
<p>Now, we configure Contained, the container runtime the cluster will be using.</p>
<pre><code class="lang-plaintext">sudo tee /etc/modules-load.d/containerd.conf &lt;&lt;EOF
overlay
br_netfilter
EOF
</code></pre>
<p>Then run the <code>modprobe</code> commands to load some kernel modules</p>
<pre><code class="lang-plaintext">sudo modprobe overlay
sudo modprobe br_netfilter
</code></pre>
<p>The overlay module provides overlay filesystem support, Kubernetes needs this for its pod network abstraction.<br />The <code>br_netfilter</code> module enables bridge netfilter support in the Linux kernel, Kubernetes needs this for networking and policy.</p>
<p>We then need to add some configuration to the <code>sysctl.d/kubernetes.conf</code> file</p>
<pre><code class="lang-plaintext">sudo tee /etc/sysctl.d/kubernetes.conf &lt;&lt;EOT
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOT
sudo sysctl --system
</code></pre>
<h3 id="heading-install-packages">Install packages</h3>
<p>Now, to install some packages!</p>
<pre><code class="lang-plaintext">sudo apt install -y curl gnupg2 software-properties-common apt-transport-https ca-certificates
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/docker.gpg
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
</code></pre>
<p>Now some repos for Containerd so we can install that too.</p>
<pre><code class="lang-plaintext">sudo apt update
sudo apt install -y containerd.io
containerd config default | sudo tee /etc/containerd/config.toml &gt;/dev/null 2&gt;&amp;1
sudo sed -i 's/SystemdCgroup \= false/SystemdCgroup \= true/g' /etc/containerd/config.toml
sudo systemctl restart containerd
sudo systemctl enable containerd
</code></pre>
<p>Now time to add the Kubernetes package to our repo, we're opting for version 1.28 here (which leaves me room to practice upgrading my cluster nodes! <strong>#PracticeMakesPermenant</strong>)</p>
<pre><code class="lang-plaintext">$ curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update
sudo apt install -y kubelet kubeadm kubectl
</code></pre>
<p>So what's happening here? First, we download the public GPG key then we add the Kubernetes apr repo as it's not natively available on Ubuntu.</p>
<p>Make note of "<strong>https://pkgs.k8s.io</strong>" in the package repo URL, there has been a recent update as to where the repos are stored and accessed from instead of the previously Google hosted repos, this happened around a year ago, just so it doesn't trip you up if you're reading some other articles about setting up a cluster. More on the upgrade <a target="_blank" href="https://kubernetes.io/blog/2023/08/15/pkgs-k8s-io-introduction/">here</a>.</p>
<p>Optional: You can also run the command <code>sudo apt-mark hold kubelet kubeadm kubectl</code> to pin the version of kublet, kubeadm and kubectl to 1.28 so we don't accidentally upgrade them. This is for dev purposes so you don't have to worry about it for now.</p>
<h2 id="heading-just-the-control-plane-node-master-node-vm">Just the control plane node / master node VM</h2>
<h3 id="heading-kubeadm-init">Kubeadm init</h3>
<p>Now this next command is specific for your nominated control plane node or master node, whatever you want to call it.</p>
<pre><code class="lang-plaintext">sudo kubeadm init
</code></pre>
<p>You could also manually specify the DNS name or IP of your control plane if you wanted to refer to it with the kube config file using <code>sudo kubeadm init --control-plane-endpoint=YOUR_CONTROLPLANE_VM_NAME</code></p>
<p>But this simple init command will start to initialise your cluster, once finished it should give you some information about your cluster, and joining nodes to it but for now we're interested in these 2 (we'll get to joining nodes in a minute):</p>
<pre><code class="lang-plaintext">mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
</code></pre>
<p>First, we make sure the <code>.kube</code> directory exists in our home directory, Kubernetes will look for the cluster config file here.</p>
<p>We're then copying the cluster config file from the Kubernetes directory into the <code>.kube</code> directory which has all the information for <code>kubectl</code> to interact with our cluster kubeapiserver component, we then change the file permissions owner to our user so we can use the config file.</p>
<h3 id="heading-testing-the-control-plane-master-node">Testing the control plane / master node</h3>
<p>Now, let's test it out!</p>
<pre><code class="lang-plaintext">kubectl cluster-info
kubectl get nodes
</code></pre>
<p>These commands will display our cluster information the control plane endpoint information etc.</p>
<h2 id="heading-nearly-done-just-the-worker-nodes-now">Nearly done! Just the worker nodes now...</h2>
<h3 id="heading-kubeadm-join">Kubeadm join</h3>
<p>Now we need some nodes to complete our cluster, don't worry we're nearly done!</p>
<p>On each one of our VMs that will join as a node, we run the <code>kubeadm join</code> command that was displayed after the <code>kube init</code> command on the control plane.</p>
<pre><code class="lang-plaintext">sudo kubeadm join kube-master-vm-name:6443 --token 123456.vcwibsv...
or
sudo kubeadm join 192.168.1.100:6443 --token 123456.vcwibsv...
</code></pre>
<p>If your token has expired or you have decided to add another node at a later date, that's not a problem, you can run <code>kubeadm token create --print-join-command</code>from the control plane VM and you can use the kubeadm join command displayed after.</p>
<p>You should see some pre-flight checks output and then you should see <code>This node has jkoined the cluster</code> displayed, jump back on to your control plane/master VM and try <code>kubectl get nodes</code> you should see the node as well as the control plane now displayed.</p>
<p>Continue the same on the other nodes you are adding to your cluster until you're done.</p>
<h2 id="heading-post-cluster-creation">Post cluster creation</h2>
<p>Last thing...... you might be seeing that your nodes are "Not Ready" What gives?!</p>
<p>We still need to add networking to the cluster so pods and services can find each other. I suggest the <a target="_blank" href="https://docs.tigera.io/calico/latest/about/">Calico CNI</a></p>
<pre><code class="lang-plaintext">kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/calico.yaml
</code></pre>
<p>This is applying the manifest file from the Calico GitHub repo to our Kubernetes cluster you can read more from the repo here.</p>
<h3 id="heading-testing-testing-1-2-3">Testing. Testing. 1, 2, 3...</h3>
<p>Now, let's check that worked...</p>
<pre><code class="lang-plaintext">kubectl get pods -n kube-system
</code></pre>
<p>Wait for the Calico pods, you should have one per node because its a DaemonSet and a calico-kube-controller which is a deployment.</p>
<p>And then finally<code>kubectl get nodes</code> should display something like......</p>
<pre><code class="lang-plaintext">NAME                STATUS   ROLES           AGE   VERSION
kube-dev-master-1   Ready    control-plane   16d   v1.28.10
kube-dev-node-1     Ready    &lt;none&gt;          16d   v1.28.10
kube-dev-node-2     Ready    &lt;none&gt;          15d   v1.28.10
kube-dev-node-3     Ready    &lt;none&gt;          14d   v1.28.10
</code></pre>
<h2 id="heading-congrats">Congrats!</h2>
<p>You made it this far and if it looks like the above..... You've done it! Nice work!</p>
<p>Try creating a test pod/deployment workload using either of the following commands:</p>
<pre><code class="lang-plaintext">kubectl run web-test --image nginx
or
kubectl create deployment web-test --image nginx --replicas 3
</code></pre>
<p>This will create a single pod that is running nginx or a deployment that will ensure 3 pods are running nginx.</p>
<p>I've tried to create this how-to guide as best as I can from my experience creating my own cluster for preparing for the CKA exam, this is by no means a production-ready cluster!</p>
<p>But just for running internally at home, this should give some good insight and some good practice for cluster administration and hopefully, someone might find this useful.</p>
<p>Feel free to leave any questions, comments, or feedback. I'm always happy to take constructive feedback or just hear how people might do this differently!</p>
]]></content:encoded></item><item><title><![CDATA[Certified Kubernetes Administrator preparation and certification]]></title><description><![CDATA[Well, that was a long hiatus..... typical tech person starts writing a blog then life, kids and whatever else little time for non-techy hobbies (Lego and games) gets in the way and as soon as you know it, it's been about 18 months since my last post....]]></description><link>https://ferrishall.dev/certified-kubernetes-administrator-preparation-and-certification</link><guid isPermaLink="true">https://ferrishall.dev/certified-kubernetes-administrator-preparation-and-certification</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Devops]]></category><category><![CDATA[cka]]></category><category><![CDATA[CKA Exam]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Tue, 21 May 2024 10:40:33 GMT</pubDate><content:encoded><![CDATA[<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716282002313/d779c156-6a95-4dcd-88d5-f508385484cd.png" alt class="image--center mx-auto" /></p>
<p>Well, that was a long hiatus..... typical tech person starts writing a blog then life, kids and whatever else little time for non-techy hobbies (Lego and games) gets in the way and as soon as you know it, it's been about 18 months since my last post..... Sorry, my bad.</p>
<p>For the last year I've been delivering technical and Google Cloud training which has been very fun and very busy, but recently I took some time to get exam ready and finally take the CKA exam... And passed it!</p>
<p>So I wanted to write something up with my thoughts and how I approached it, as I read several blog posts which I found helpful.</p>
<p>Firstly, my background if you haven't read the "<a target="_blank" href="https://ferrishall.dev/about-me">about me</a>" section yet, I'm a Linux sysadmin by trade and I have used containers a fair bit over the years so Kubernetes was a simple concept for me to "get" as it were, knowing what problems it solves, what complexities it can also introduce along with the payoffs comes with the fundamental learnings I picked up. Actually using it was what I needed hands-on experience with!</p>
<p>So my background with Kubernetes is I've been using Docker and containers for around 4 years or so maybe longer, I can't remember and I've been using Kubernetes for probably around 3 years.</p>
<p>I would strongly recommend learning the fundamentals of Linux, networking and containers before starting with Kubernetes. It will help with understanding the "why" as well as the "how".</p>
<p>Anyway, that's my public safety disclaimer out of the way, how did I find it and how did I prepare for it, I'll get to the point.</p>
<h2 id="heading-the-point">The point</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716283002827/59f64d97-279b-4e12-a128-a1d763e5f978.jpeg" alt class="image--center mx-auto" /></p>
<p>Like I said I had been using Kubernetes with more meaning and anger for the last 2 years, how? Well, there are a few different ways each will be suitable depending on what you want to get out, just passing the cert or you want to get a good understanding of Kubernetes.</p>
<p>There are some fantastic paid courses as well as free resources available out in the world which will cover all the exam objectives. When I first started learning Kubernetes I wanted to learn it and understand it.</p>
<p>First, <a target="_blank" href="https://github.com/kelseyhightower/kubernetes-the-hard-way">Kubernetes the hard way by Kelsey Hightower</a>, "Mr. Kubernetes" has a GitHub repo which walks you through creating a cluster, no start-up scripts, no packaged configurations, all handcrafted! This is a great resource to get into the weeds of what a Kubernetes cluster is made of and how all those components communicate and why.</p>
<p>This helped me understand why Kubernetes cluster administration bare-metal/on-prem was firstly considered such a task in the first place and then really REALLY appreciate what managed platforms of Kubernetes like Google Kubernetes Engine do and what benefits it adds. Kubernetes the hard way is a fantastic way to spend time learning about Kubernetes, but it might be overkill for someone getting started. I already had a grasp of the basics before trying this and used it to understand the components of the cluster.</p>
<p>I also took to creating a cluster using <a target="_blank" href="https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/">Kubeadm</a> which is a packaged version of all the Kubernetes components, it's a nice in-between from doing it the hard way and having it all done for you with a managed service like GKE. This is slightly more manageable if your time is restricted and just want to get a cluster up and running to play around with.</p>
<p>Kubeadm is also what Kubernetes exam clusters are based on, so knowing how to manage and use it is also an important skill to practice and learn.</p>
<p>With a cluster at my disposal, I decided to try and deploy all the containers that I use at home like Homeassistant, Pihole, Unbound etc and deploy them on a cluster, having a cluster also let me have a tinker with ArgoCD which is incredibly cool (another blog post maybe...) So I had some real-world context to practice with which helped me get everyday practice with using different objects and scenarios in a Kubernetes cluster.</p>
<p>Finally, another resource I suggest if you don't have spare compute capacity like VMs or Raspberry PIs and don't want a GKE bill, <a target="_blank" href="https://minikube.sigs.k8s.io/docs/start/">MiniKube</a> is a great tool to get something that will run on your laptop and you can use to practice using kubectl and applying manifest yaml files.</p>
<p>Now, I could have hammered in the course and passed within 6 months, maybe less, I've read enough posts that people have said that's achievable which is fine if that's what you want to achieve, I wanted to learn Kubernetes to learn it not to get a cert and forget about it, in fact, I still am learning as much as I can about it! It's one of my current favourite topics.</p>
<p>I took the certification to validate to myself that I have learnt the skills I set out to learn and wanted to test myself under pressure. (I probably just hate myself deep down :laughing_but_crying)</p>
<h2 id="heading-exam-details">Exam details</h2>
<p>The Kubernetes Administrator exam is performance-based, meaning you don't need to remember facts, numbers and maybe some obscure information that you'd only ever need in an exam but you're tested on how well you perform tasks on real clusters and under time pressure, 2 hours of time pressure to be exact!</p>
<p>You get 15-20 questions in ranging difficulty, some will be easier one-liners, and others will be more difficult tasks like troubleshooting and redeploying workloads or other objects right through to actual cluster administration.</p>
<p>This exam is also somewhat "open book" meaning that you can use the documentation at the <a target="_blank" href="https://kubernetes.io/docs/home/">kubernetes.io</a> site during the exam (certain domains are allowed), which is very handy for when you need to write the yaml manifest files.</p>
<p>You can buy an exam voucher from the <a target="_blank" href="https://training.linuxfoundation.org/certification/certified-kubernetes-administrator-cka/">Linux Foundation training portal</a> the good news is there are usually discounts and even bundles, I purchased mine with the <a target="_blank" href="https://training.linuxfoundation.org/certification/kubernetes-cloud-native-associate/">KCNA</a> certification for a discount.</p>
<p>The CKA voucher will also give you an additional exam attempt if you fail, which does help take the pressure off, though from purchase you have 1 year to attempt both tries. You also get 2 sessions using killer.sh which is great to practice different scenarios and it's a very close simulation to the exam so it's good to get used to before you head into the exam.</p>
<p>At the end of the day it's just a certification exam, the important part is learning and getting experience with something you enjoy using! Now I'm not someone who knocks certifications, I've got a few and failed a few, they're great for validating the skills that you've learnt. I really enjoyed the CKA exam as it was a real test of skills, just you and a bunch of clusters doing real-world tasks!</p>
<p>The main takeaway is practice makes permanent when using Kubernetes, practice the kubectl CLI, practice using different objects and practice breaking and fixing your cluster.</p>
<p>Practice makes permanent!</p>
]]></content:encoded></item><item><title><![CDATA[Google Cloud 101.1]]></title><description><![CDATA[So recently this week I have been getting back to basics in Google Cloud, various mentoring and preparation for delivering training classes on getting started with Google Cloud infrastructure.
So it got me thinking, I should probably put some best pr...]]></description><link>https://ferrishall.dev/google-cloud-1011</link><guid isPermaLink="true">https://ferrishall.dev/google-cloud-1011</guid><category><![CDATA[Google]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[google cloud]]></category><category><![CDATA[Security]]></category><category><![CDATA[getting started]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Fri, 11 Nov 2022 23:38:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1668208584529/kqR7Q5EIo.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>So recently this week I have been getting back to basics in Google Cloud, various mentoring and preparation for delivering training classes on getting started with Google Cloud infrastructure.</p>
<p>So it got me thinking, I should probably put some best practice tips on here. I'm not going to be writing about what a VM is and what the cloud is, I'm assuming most reading this have been there already.</p>
<p>So this is more of a "I've spun up a VM using default config, now what?" next step post. Well, this is where I'll add some pointers.</p>
<h2 id="heading-iam-and-least-privilege">IAM and least privilege</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1668208372016/Y2_ZYcTuL.png" alt="image.png" /></p>
<p>So I'll start with Compute Engine, that's pretty much where everyone starts?!
You may have noticed that when you create a Compute Instance, it'll come configured as a default to use the project's default Compute Engine service account, easy right?! Well yes but not overly best practice or secure.</p>
<p>Take a look at the IAM page and you'll notice that the default Compute Engine service account comes locked and loaded with the "Editor" role, again great what's the problem?! I can use that to do what I need and no more config is needed, awesome! Well, that is the problem!</p>
<p>The editor role is way too overpowered and broad for a VM that might just need access to a Cloud Storage bucket or maybe Cloud SQL. </p>
<p>To get into the habit of best practice we should be using IAM securely and only giving Google Cloud Identities like users and service accounts IAM roles to access services or do the jobs they need to, nothing more.</p>
<p>That way if our Compute Engine VM or the service account attached to it were to become comprised or mistakes during deployment happens, the blast radius is smaller and no unnecessary access can be leveraged.</p>
<p>Our Compute Engine VM, depending on what it is being used for, say for this example a web server. 
We would create a dedicated service account that has the IAM role <code>roles/storage.objectAdmin</code> this then allows the Compute Engine VM the permission to create, update, delete etc objects within Google Cloud Storage buckets. </p>
<p>Great! This ensures that the VM has no unnecessary IAM roles to do anything or have access to anything that we don't intend for it to do.</p>
<p>The practice of least privilege should be used for all IAM roles for identities and service accounts when creating Google Cloud resources, there's more info on this on the IAM <a target="_blank" href="https://cloud.google.com/iam/docs/using-iam-securely#least_privilege">Google doc</a> which explains it nicely.</p>
<h2 id="heading-vpc-network">VPC Network</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1668209054895/2n5AqKpWO.png" alt="image.png" /></p>
<p>While we're at it with our Compute Engine VM, when you first spin one up you probably attached it to a network, right? the default VPC? </p>
<p>That's fine but we can level this up a bit by getting rid of the default VPC network and creating a custom VPC network with subnets in the region we're working in.</p>
<p>This controls what our subnet IP ranges are and what region we have our subnets created in. 
We can also get more control over the firewall rules and not have the default firewall rules available to use. </p>
<p>We really don't want to have firewalls that we don't intend on using and we don't want a "default-allow-ssh" firewall rule to all instances in our VPC that is available to 0.0.0.0.</p>
<p>Again, this all points to best practices and ensuring security by design, having those security considerations in mind when creating resources in Google Cloud.</p>
<p>I didn't include any step-by-step instructions on how to do these steps, it's too late in the evening, but it's always worth taking a look the next time you're in a Google Cloud project (just don't test in production!)</p>
<p>Please feel free to reach out if you have any thoughts or feedback, I'm always happy to get constructive feedback and to keep learning!</p>
]]></content:encoded></item><item><title><![CDATA[My misinterpretation of GCP IAM policy for projects]]></title><description><![CDATA[So It's not all about writing and shouting about all the cool stuff you've done or how smart you are. It's about writing about the times you've f*&%£d up and how you learnt from it.
So Friday, I completely nuked a GCP project IAM policy with Terrafor...]]></description><link>https://ferrishall.dev/my-misinterpretation-of-gcp-iam-policy-for-projects</link><guid isPermaLink="true">https://ferrishall.dev/my-misinterpretation-of-gcp-iam-policy-for-projects</guid><category><![CDATA[GCP]]></category><category><![CDATA[infrastructure]]></category><category><![CDATA[Terraform]]></category><category><![CDATA[TIL]]></category><category><![CDATA[Cloud]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Wed, 20 Jul 2022 14:10:08 GMT</pubDate><content:encoded><![CDATA[<p>So It's not all about writing and shouting about all the cool stuff you've done or how smart you are. It's about writing about the times you've f*&amp;%£d up and how you learnt from it.</p>
<p>So Friday, I completely nuked a GCP project IAM policy with Terraform and locked everything and everyone out! </p>
<p>Quite spectacular for a Friday morning, oh yeah I didn't mention this was on a Friday! :FacePalm</p>
<p>So how did I manage this? </p>
<p>I was testing some changes related to de-privileging an App Engine default service account, it automatically gets an editor role assigned which isn't great to have hanging around.</p>
<p>I'm currently using Terraform as the tool of choice for deploying infrastructure and Cloud Build for actually running the deployment.</p>
<p>I had tested this in a sandpit project, fairly new and just used a block of code similar to this:</p>
<pre><code>resource <span class="hljs-string">"google_project_iam_policy"</span> <span class="hljs-string">"project"</span> {
  project     <span class="hljs-operator">=</span> <span class="hljs-string">"your-project-id"</span>
  policy_data <span class="hljs-operator">=</span> data.google_iam_policy.admin.policy_data
}

data <span class="hljs-string">"google_iam_policy"</span> <span class="hljs-string">"admin"</span> {
  binding {
    role <span class="hljs-operator">=</span> <span class="hljs-string">"roles/viewer"</span>

    members <span class="hljs-operator">=</span> [
      <span class="hljs-string">"serviceAccount:default_appengine@googleserviceaccount.com"</span>,
    ]
  }
}
</code></pre><p>Now, the mistake I spotted after applying this was this set the IAM policy for the entire project, not just the member referenced. Again completely my fault for not correctly reading the docs and the very clearly stated warning :</p>
<p><em>
You can accidentally lock yourself out of your project using this resource. Deleting a google_project_iam_policy removes access from anyone without organization-level access to the project. Proceed with caution. It's not recommended to use google_project_iam_policy with your provider project to avoid locking yourself out, and it should generally only be used with projects fully managed by Terraform. If you do use this resource, it is recommended to import the policy before applying the change.</em></p>
<p>I mentioned I tested this, didn't I ?! </p>
<p>I did, In a project which was pretty clean and the test worked and I still had access, so what gives?!</p>
<p>Luckily the org my sandpit project was in had some well-thought-out permissions set on the folder where my project lives, so inheritance preserved my IAM permissions on the project, but as it was a clean project I overlooked the missing Google services service accounts that had been removed.</p>
<p>I thought it looked good and proceeded to apply my changes for real!</p>
<p>The build started and then suddenly failed and access was lost, I thought that was very coincidental and then to my horror realised what I had done. </p>
<p>Oh crap!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1658325432297/lLUDEgtz_.png" alt="image.png" /></p>
<p>Essentially I had removed all IAMs from the project and replaced the whole project IAM policy with just a single viewer role on a dedicated user-managed service account intended for App engine.</p>
<p>Reading back through the documentation, it made perfect sense, in previous experience I had only ever used <code>google_project_iam_member</code> Terraform resource. which is non-authoritative.</p>
<h2 id="heading-own-it-get-help">Own it. Get help</h2>
<p>Essentially, I'm writing this to highlight that bad stuff happens to most people and it all depends on how you deal with it.</p>
<p>I reached out for help once I realised I was locked out and had made a pretty big derp.
I got the help I needed and luckily got access back to the project that also luckily wasn't actually in production, I was preparing it for test use.</p>
<p>Sitting on it and stewing on it worrying about getting in trouble will never help the situation and remember, everyone has made mistakes before as long as no one dies and it wasn't intentional, most people will be understanding.</p>
<p>The team I'm working with had a bit of a laugh and it also made a good story where my other teammates told some of their war stories. </p>
<p>After I had access again, I then had to re-add the service accounts to the IAM service agent roles. Bit of a pain as was a lot of trial and error.</p>
<p>Some resources stopped working and took some troubleshooting to work out what was missing but I got there in the end.</p>
<p>But as that was the worst of it and took a couple of days to put what I'm hoping, most of it right again, I probably got away with it!</p>
]]></content:encoded></item><item><title><![CDATA[Deploying Terraform in a GCP Cloud Build Pipeline]]></title><description><![CDATA[If you followed my last post Getting started with Terraform on GCP, you've hopefully started deploying your GCP infrastructure using Terraform. Great start! If you haven't yet, I recommend checking out my post if you're new to IaC, DevOps, Terraform ...]]></description><link>https://ferrishall.dev/deploying-terraform-in-a-gcp-cloud-build-pipeline</link><guid isPermaLink="true">https://ferrishall.dev/deploying-terraform-in-a-gcp-cloud-build-pipeline</guid><category><![CDATA[GCP]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Terraform]]></category><category><![CDATA[ci-cd]]></category><category><![CDATA[#cloud-build]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Mon, 18 Jul 2022 21:35:16 GMT</pubDate><content:encoded><![CDATA[<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656803838908/dSO9J1w2f.jpeg" alt="0_tpKI_meIRGMbBllI.jpeg" /></p>
<p>If you followed my last post <a target="_blank" href="https://ferrishall.dev/getting-started-with-terraform-on-gcp">Getting started with Terraform on GCP</a>, you've hopefully started deploying your GCP infrastructure using Terraform. Great start! If you haven't yet, I recommend checking out my post if you're new to IaC, DevOps, Terraform etc.</p>
<p>In this tutorial, we'll look at running our Terraform in a pipeline, specifically in GCP Cloud Build. Apologies this has been so delayed after my first part, I've been swamped and enjoying some nice weather. This tutorial is a little rough and ready, time was the priority here.</p>
<p><strong>Disclaimer!</strong> This demo is intended to quickly get you using a pipeline to deploy Terraform and demonstrate the benefits, this method is not fit for production and should not be used for so!</p>
<p>I will write up a quick summary of how to improve this and where you can go afterwards.</p>
<h2 id="heading-what-is-a-pipeline-and-more-so-why-bother">What is a pipeline? And more so, why bother?!</h2>
<p>You have probably heard the term pipeline and CI/CD.</p>
<p>A pipeline is usually part of the CI/CD process. CI/CD is the continuous integration of developer code and when that code has been developed and merged it is then deployed into the working environment. continuous building, integration, testing and deployment.</p>
<p>Running our Terraform in a pipeline makes it more transparent and collaborative. No more wondering who is pushing to what and running Terraform from their laptop. It also ensures the tasks running are running in a consistent environment, the same version of providers, Terraform etc.</p>
<p>We'll look at the different parts of what makes our pipeline and how we get it to run successfully.</p>
<h2 id="heading-getting-started">Getting started</h2>
<p>First, create a new project which you don't mind deleting afterwards. If you're unable to, just make a note of everything you add ad create so you can delete it when you're done to avoid any costs or security implications.</p>
<p>We'll need to make sure Cloud Build is enabled. Then we'll give the cloud build service account the editor IAM role for this demo purpose.</p>
<p>In real-world circumstances you'd need to be using least privilege IAM, using a dedicated service account to deploy Terraform and only the IAM roles that it needs (This is best practice, I repeat! In real-world projects use least privilege when assigning IAM roles!).</p>
<p>We'll need also to add the build yaml files that will tell Cloud Build what tasks we want to run. You can find it <a target="_blank" href="https://github.com/Ferrish07/blog-tf-demo-01/blob/main/terraform-plan.yaml">here</a>.</p>
<h2 id="heading-gcp-build-trigger">GCP Build Trigger</h2>
<p>Create a GCP Cloud Build Trigger. We'll need to connect Cloud Build to our GitHub repo so it can map and allow builds.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656877658519/94o0JECvn.png" alt="image.png" /></p>
<p>Then we'll configure our trigger to run our Terraform plan when we create a pull request to the main branch, the build config will point to the <code>terrform-plan.yaml</code> file we created earlier.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656883105417/6ISl0Lb1s.png" alt="image.png" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656883240627/Lan3yaED3.png" alt="image.png" /></p>
<p>We'll also create one for running terraform apply, the difference in this trigger will be "Push to a branch" As we want this trigger to invoke when the pull request has been approved and merged into the main branch.</p>
<p>So, let's push to our feature branch. We can make an arbitrary change, like changing the name of the VM. Commit and push to our feature branch and create a Pull Request.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656884760922/NWwRdi5MP.png" alt="image.png" /></p>
<p>Cool! Our build has been triggered and the terraform-plan trigger is running, it will also appear as a check, if the check fails, if the Terraform plan fails for whatever reason, we won't be able to merge our "broken" code to the main branch. So adds a good safety net to our repo.</p>
<p>Our Terraform plan should pass with resources to add, we can now merge to main which will effectively push to main and trigger our terraform-apply Cloud Build trigger and apply our Terraform.</p>
<p>We can check the progress of the build from the Cloud Build log output from the Cloud Build dashboard, watch for any errors etc.</p>
<h2 id="heading-build-success">Build Success!</h2>
<p>So that's that! You've just automated your Infrastructure as code! Congrats! It seems like a lot to do to effectively just run terraform apply, but hopefully, you'll start to see the benefits of deploying your infrastructure in a more automated way. So hopefully this has helped even just a bit.</p>
<h2 id="heading-clean-up">Clean up</h2>
<p>Don't forget to delete any service accounts, and service account keys, and remove any editor IAM roles if you're using a project you intend on keeping. Or to be safe just delete your project.</p>
<h2 id="heading-ideas-for-improvement">Ideas for improvement</h2>
<p>So now you can see where a pipeline and automating your IaC deployments can be beneficial.</p>
<p>But there are a few security implications with this method and some security considerations you should make when taking this to a real-world and production-grade environment.</p>
<p>Use a dedicated service account for terraform, adding the Cloud Build default service account to the editor role is not a good idea, it's too broad an IAM role and everyone with the right role can use it.</p>
<p>You could also use that service account to trigger your builds instead of the default cloud Build service account.</p>
<p>There are some additional logging options that need to be added to the cloudbuild.yaml files. Or, you can use account impersonation as part of your cloudbuild.yaml. could build acting as your Terraform service account. More on account impersonation <a target="_blank" href="https://cloud.google.com/iam/docs/impersonating-service-accounts#iam-service-accounts-grant-role-sa-console">here</a>.</p>
<p>Remote state. In this demonstration our state was not persistent, we let Cloud Build run <code>terraform init</code> locally, meaning we would have lost the terraform state file when the Cloud Build container environment ended.</p>
<p>We really should have created a GCS bucket and added a <code>backend.tf</code> file to tell terraform to store the state file in our remote GCS bucket. if we wanted to make any changes or destroy, we had no reference to state! When I get time I'll update this demo and the repo to add a <code>backend.tf</code> file.</p>
<p>For you more eagle-eyed readers, yes I did accidentally commit the name of a service account and project ID. They are long gone now and absolutely not a good idea to commit to a public repo!</p>
<p>Some pre-commit hooks could have probably saved me from the embarrassment, there are some really good ones for Terraform projects, Google search is your friend!</p>
<p>Any comments?! Please let me know! I'm always interested in hearing about different approaches to pipelines and deploying infrastructure. There really are so many different ways, approaches and opinions to this, all with different weighted pros and cons.</p>
<p>Please ping me if you have any questions or feedback! Good or bad!</p>
<p>Thanks for reading and following along!</p>
]]></content:encoded></item><item><title><![CDATA[Getting started with Terraform on GCP]]></title><description><![CDATA[This is the first part of potentially a few tutorials on how to get started with deploying infrastructure on Google Cloud Platform.
If you are new to Terraform it might be worth checking out my quick introduction to what Terraform is and why to use i...]]></description><link>https://ferrishall.dev/getting-started-with-terraform-on-gcp</link><guid isPermaLink="true">https://ferrishall.dev/getting-started-with-terraform-on-gcp</guid><category><![CDATA[GCP]]></category><category><![CDATA[Terraform]]></category><category><![CDATA[Tutorial]]></category><category><![CDATA[Devops]]></category><category><![CDATA[infrastructure]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Sun, 26 Jun 2022 22:38:06 GMT</pubDate><content:encoded><![CDATA[<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656282919130/baUggbutJ.png" alt="1_wVfnIRL2g8D39z7-KATBQw.png" /></p>
<p>This is the first part of potentially a few tutorials on how to get started with deploying infrastructure on Google Cloud Platform.</p>
<p>If you are new to Terraform it might be worth checking out my <a target="_blank" href="https://ferrishall.dev/infrastructure-as-code-terraform-101">quick introduction</a> to what Terraform is and why to use it.</p>
<h2 id="heading-getting-started">Getting started</h2>
<p>So to get started you'll need a GCP project, you can get started for free with (I still believe there is) a free tier and/or $300 credit to get going.</p>
<p>You'll also need to download and install the <a target="_blank" href="https://www.terraform.io/downloads">Terraform binary</a> for your OS of choice.
You'll also need to be used to working in the terminal and you'll need to create an empty working directory for your Terraform code.</p>
<p>In this example, we are going to be running as our own Google user identity but I'll explain in future tutorials the benefits of service accounts and impersonation.</p>
<p>Firstly we going to tell Terraform how to interact with our chosen platform, GCP.</p>
<p>In the root of our directory, we'll create <code>providers.tf</code>and add the following:</p>
<pre><code>provider <span class="hljs-string">"google"</span> {}
</code></pre><p>But we'll add some configuration for our project here too, just the project and region for now, it saves us typing it in on all our resources later.</p>
<pre><code>provider <span class="hljs-string">"google"</span> {
  project = <span class="hljs-string">"my-terraform-gcp-project"</span>
  region = <span class="hljs-string">"europe-west2"</span>
}
</code></pre><h2 id="heading-adding-resources">Adding resources</h2>
<p>Next, we need somewhere to declare what we want to create in GCP, these are referred to as resources. eg. A VPC network is a google cloud resource, a subnet in that VPC would be a separate resource and the VM that is attached to that subnet would be another resource etc. You get the picture.
Here is an example of a VPC and a subnet in a <code>main.tf</code> file:</p>
<pre><code>resource <span class="hljs-string">"google_compute_network"</span> <span class="hljs-string">"custom-vpc"</span> {
  name                    = <span class="hljs-string">"test-tf-network"</span>
  auto_create_subnetworks = <span class="hljs-literal">false</span>
}

resource <span class="hljs-string">"google_compute_subnetwork"</span> <span class="hljs-string">"subnet"</span> {
  name              = <span class="hljs-string">"test-tf-subnetwork"</span>
  ip_cidr_range = <span class="hljs-string">"10.2.0.0/16"</span>
  region             = <span class="hljs-string">"europe-west2"</span>
  network          = google_compute_network.custom-vpc.id
}
</code></pre><p>That's it! </p>
<p>That's about 10 lines of code to create a VPC network and a subnet, it's pretty impressive and cool eh?!</p>
<p>Now, there are a lot more options and inputs we could add for an even more opinionated VPC and Subnet which is covered in Terraform's very handy and helpful available provider registry documentation via the <a target="_blank" href="https://registry.terraform.io/providers/hashicorp/google/latest/docs">Terraform registry</a> when you really get going you'll spend a lot of time here!</p>
<h2 id="heading-planning-and-applying">Planning and applying</h2>
<p>So we have our resources in our <code>main.tf</code> ready to go, we need to plan our additions and then apply when we're happy</p>
<p>To run our plan you need to enter the command <code>terraform plan</code> Terraform will then take a look at the resources in our Terraform files and it will also take a look at the state, now we haven't really covered state in much detail yet. Later!</p>
<p>This is the really cool part, we are declaratively telling Terraform what we want our infrastructure in GCP to look like.</p>
<p>In our case, we don't have anything in the state as this Terraform is all new so we should see a plan of 2 to add. The VPC network and the subnet. Think of terraform plan as a dry run or what would this current code add, change or remove if I applied.</p>
<p>After running a plan we can now run <code>terraform apply</code> it will run a plan once more but will ask us "do you want to perform these actions" type yes and enter and all going well, the resources in our terraform will be deployed in GCP!</p>
<h2 id="heading-additions-and-changes">Additions and changes</h2>
<p>So that's our VPC and subnet running and configured in GCP, great! but what if we wanted to make changes or even add some more resources?
Well, we can add something to our existing code, let's add a Compute instance to run on our network.
 This is where the state comes in. you might have noticed a new file appear in your directory <code>terraform.tfstate</code> this is the state file which represents our GCP infrastructure in JSON.</p>
<p>Let's add a web server VM and a firewall rule:</p>
<pre><code>resource <span class="hljs-string">"google_compute_firewall"</span> <span class="hljs-string">"web-fw"</span> {
  name          = <span class="hljs-string">"http-rule"</span>
  network      = google_compute_network.custom-vpc.id
  description = <span class="hljs-string">"Creates firewall rule targeting tagged instances"</span>

  allow {
    protocol = <span class="hljs-string">"tcp"</span>
    ports      = [<span class="hljs-string">"80"</span>]
  }

  target_tags       = [<span class="hljs-string">"web"</span>]
  source_ranges = [<span class="hljs-string">"0.0.0.0/0"</span>]
}

resource <span class="hljs-string">"google_compute_instance"</span> <span class="hljs-string">"default"</span> {
  name               = <span class="hljs-string">"tf-test-web-vm"</span>
  machine_type = <span class="hljs-string">"g1-small"</span>
  zone                = <span class="hljs-string">"europe-west2-b"</span>
  tags                 = [<span class="hljs-string">"web"</span>]

  boot_disk {
    initialize_params {
      image = <span class="hljs-string">"debian-cloud/debian-11"</span>
    }
  }

  network_interface {
    subnetwork = google_compute_subnetwork.subnet.self_link

    access_config {
      <span class="hljs-comment">// Ephemeral public IP</span>
    }
  }

  metadata_startup_script = file(<span class="hljs-string">"./startup.sh"</span>)

  service_account {
    scopes = [<span class="hljs-string">"cloud-platform"</span>]
  }
}
</code></pre><p>When we run <code>teraform plan</code> again it will check the <code>terraformstate.tf</code> file for what already exists and notice the difference is the new compute instance resource and the firewall resource, it will then proceed to add the 2 new resources when we run a <code>terraform apply</code></p>
<p>Making our code declarative! we are declaring to Terraform what we want our infrastructure to look like and how it's configured.</p>
<p>If we didn't add any new resources to our code and ran a plan or apply then Terraform would inform us that everything looks as it should according to the <code>terraformstate.tf</code> file.</p>
<p>Now if someone went into the console and decided to add another port to our firewall when we run apply again it would notice that it hasn't been declared in the terraform files, our <code>main.tf</code> in this example and would remove it leaving just the configured port 80. So it works by removing configuration and resources too.</p>
<h2 id="heading-destroying">Destroying</h2>
<p>Lastly, to wrap up, let's get rid of any resources that you've created so you don't get charged for them.
Running the command <code>terraform destroy</code> will offer if you are sure you want to destroy the resources as this cannot be undone.</p>
<p>That's it for this tutorial, I hope this has helped you get started with Terraform and explained its uses and demonstrated why it's pretty much the standard for deploying infrastructure at the moment and such a valuable skill to have experience with.</p>
<p>I have a repo which the code that I used for this tutorial which you can find <a target="_blank" href="https://github.com/Ferrish07/blog-tf-demo-01">here</a>.</p>
<p>Please let me know if you spot any inconsistencies in my code, wording etc. I'm open to all feedback!</p>
<p>Next up I'm aiming to get a tutorial on running Terraform in a pipeline, this is where the really cool magic happens!</p>
]]></content:encoded></item><item><title><![CDATA[Infrastructure as Code Terraform 101]]></title><description><![CDATA[You may have deployed some cloud resources, a couple of VMs, a custom VPC, or maybe even a GKE cluster.
Now, you may have wondered "Why would should I bother with IaC, clicking through the console is easy and quick!" That may be true but try doing it...]]></description><link>https://ferrishall.dev/infrastructure-as-code-terraform-101</link><guid isPermaLink="true">https://ferrishall.dev/infrastructure-as-code-terraform-101</guid><category><![CDATA[Terraform]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Sun, 26 Jun 2022 22:31:59 GMT</pubDate><content:encoded><![CDATA[<p>You may have deployed some cloud resources, a couple of VMs, a custom VPC, or maybe even a GKE cluster.</p>
<p>Now, you may have wondered "Why would should I bother with IaC, clicking through the console is easy and quick!" That may be true but try doing it for 10 VMs or a bunch of firewall rules for your VPC. </p>
<p>You'll soon figure out that it can be painfully slow and can introduce inconsistencies and errors. we're all human and/or have fat fingers like me.</p>
<p>My personal IaC of choice is Terraform, it's a declarative language expressed in HCL or JSON in which you configure your infrastructure to declare the state and what it should look like or how it should be configured. </p>
<p>So Why? </p>
<p>Deploying your infrastructure using IaC is repeatable, dependable and auditable. You can create modules so all your resources are deployed using the same code.
Dependable, you can test and change your variables.
Auditable, you can track changes using Git, even deployments using CICD pipelines.</p>
<p>Next up I'll be dropping some examples and tutorials to help anyone get started!</p>
<p>Getting started is part 1 of my tutorials. Enjoy!</p>
]]></content:encoded></item><item><title><![CDATA[GCP Cloud Composer issue]]></title><description><![CDATA[I came across a very odd and aggravating issue when developing and testing a Google Cloud Composer Terraform module today.
It's definitely a Google Composer issue, not a Terraform issue. 
When updating a Cloud Composer environment, which causes a GKE...]]></description><link>https://ferrishall.dev/gcp-cloud-composer-issue</link><guid isPermaLink="true">https://ferrishall.dev/gcp-cloud-composer-issue</guid><category><![CDATA[google cloud]]></category><category><![CDATA[Google cloud composer]]></category><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Fri, 24 Jun 2022 22:16:40 GMT</pubDate><content:encoded><![CDATA[<p>I came across a very odd and aggravating issue when developing and testing a Google Cloud Composer Terraform module today.</p>
<p>It's definitely a Google Composer issue, not a Terraform issue. 
When updating a Cloud Composer environment, which causes a GKE cluster to be recreated, it fails.</p>
<blockquote>
<p>Resource name
projects/$PROJECT_ID/locations/europe-west2/environments/test-composer-dev</p>
<p>Error message
Failed precondition (HTTP 400): Multiple errors occurred. Google Compute Engine: The subnetwork resource 'projects/$PROJECT_ID/regions/europe-west2/subnetworks/test' is already being used by 'projects/$PROJECT_ID/regions/europe-west2/nats/nat-rtr-Nat'. Could not configure workload identity because of another error Could not delete inverting proxy assignment because of another error</p>
</blockquote>
<p>This is a private composer environment so I'm using Cloud NAT to allow egrees to the internet.
It seems that Cloud NAT is using the subnet primary and secondary ranges that Cloud Composer creates for the GKE cluster, which then stops it from being able to update or destroy the environment, a race condition I guess.</p>
<p>To get around this I had to delete the Cloud NAT resource and then proceed with the change and/or deleting of the environment. Essentially freeing up cluster resources from the Cloud NAT resource that was attached to the subnet and IP ranges. Frustrating to say the least.</p>
<p>I don't have any experience with using or spinning up Cloud Composer before, from what I have read there are quite a few layers and resources which can cause clashes or issues I guess, there seem to be some "known issues" with composer.</p>
<p>Thought I would note this down, would be interesting to see or hear if anyone else has this issue or similar.</p>
]]></content:encoded></item><item><title><![CDATA[Another cloud infrastructure blog!]]></title><description><![CDATA[Good evening everyone!
I've decided to start blogging again (I blogged a good few years ago).
I'll be posting, most likely infrequently so apologies in advance!
I intend to post articles on new tech and tools I'm tinkering with, any cool or interesti...]]></description><link>https://ferrishall.dev/another-cloud-infrastructure-blog</link><guid isPermaLink="true">https://ferrishall.dev/another-cloud-infrastructure-blog</guid><dc:creator><![CDATA[Ferris Hall]]></dc:creator><pubDate>Thu, 23 Jun 2022 19:43:37 GMT</pubDate><content:encoded><![CDATA[<p>Good evening everyone!</p>
<p>I've decided to start blogging again (I blogged a good few years ago).
I'll be posting, most likely infrequently so apologies in advance!</p>
<p>I intend to post articles on new tech and tools I'm tinkering with, any cool or interesting work I have done and some how-to articles.</p>
<p>Firstly, a bit about me.</p>
<p>I’m a Google Cloud certified Platform Engineer and a Google authorized trainer at Appsbroker, Google’s largest Premier Partner in Europe. </p>
<p>Providing infrastructure expertise via design and deployment on very interesting projects. Designing and building automated processes to ensure consistent, repeatable deployments of cloud infrastructure. </p>
<p>Prior to my current role, I was a site reliability engineer at another cloud consultancy and before that, I was a Linux infrastructure sysadmin at a startup.
This is where I first got into using DevOps tools, and cloud technology and generally really started getting into and enjoying automation.</p>
<p>I'm genuinely passionate about all things IT enterprise infrastructure, DevOps, automation, learning, developing and most recently delivering training.</p>
<p>My recent found status as a GCP trainer has sparked this blog and my wanting to write about tech, DevOps, GCP etc.</p>
]]></content:encoded></item></channel></rss>