ferrishall.dev

Google Cloud 101.1

Ferris Hall — Fri, 11 Nov 2022 23:38:40 GMT

So recently this week I have been getting back to basics in Google Cloud, various mentoring and preparation for delivering training classes on getting started with Google Cloud infrastructure.

So it got me thinking, I should probably put some best practice tips on here. I'm not going to be writing about what a VM is and what the cloud is, I'm assuming most reading this have been there already.

So this is more of a "I've spun up a VM using default config, now what?" next step post. Well, this is where I'll add some pointers.

IAM and least privilege

So I'll start with Compute Engine, that's pretty much where everyone starts?!You may have noticed that when you create a Compute Instance, it'll come configured as a default to use the project's default Compute Engine service account, easy right?! Well yes but not overly best practice or secure.

Take a look at the IAM page and you'll notice that the default Compute Engine service account comes locked and loaded with the "Editor" role, again great what's the problem?! I can use that to do what I need and no more config is needed, awesome! Well, that is the problem!

The editor role is way too overpowered and broad for a VM that might just need access to a Cloud Storage bucket or maybe Cloud SQL.

To get into the habit of best practice we should be using IAM securely and only giving Google Cloud Identities like users and service accounts IAM roles to access services or do the jobs they need to, nothing more.

That way if our Compute Engine VM or the service account attached to it were to become comprised or mistakes during deployment happens, the blast radius is smaller and no unnecessary access can be leveraged.

Our Compute Engine VM, depending on what it is being used for, say for this example a web server. We would create a dedicated service account that has the IAM role roles/storage.objectAdmin this then allows the Compute Engine VM the permission to create, update, delete etc objects within Google Cloud Storage buckets.

Great! This ensures that the VM has no unnecessary IAM roles to do anything or have access to anything that we don't intend for it to do.

The practice of least privilege should be used for all IAM roles for identities and service accounts when creating Google Cloud resources, there's more info on this on the IAM Google doc which explains it nicely.

VPC Network

While we're at it with our Compute Engine VM, when you first spin one up you probably attached it to a network, right? the default VPC?

That's fine but we can level this up a bit by getting rid of the default VPC network and creating a custom VPC network with subnets in the region we're working in.

This controls what our subnet IP ranges are and what region we have our subnets created in. We can also get more control over the firewall rules and not have the default firewall rules available to use.

We really don't want to have firewalls that we don't intend on using and we don't want a "default-allow-ssh" firewall rule to all instances in our VPC that is available to 0.0.0.0.

Again, this all points to best practices and ensuring security by design, having those security considerations in mind when creating resources in Google Cloud.

I didn't include any step-by-step instructions on how to do these steps, it's too late in the evening, but it's always worth taking a look the next time you're in a Google Cloud project (just don't test in production!)

Please feel free to reach out if you have any thoughts or feedback, I'm always happy to get constructive feedback and to keep learning!

My misinterpretation of GCP IAM policy for projects

Ferris Hall — Wed, 20 Jul 2022 14:10:08 GMT

So It's not all about writing and shouting about all the cool stuff you've done or how smart you are. It's about writing about the times you've f*&%d up and how you learnt from it.

So Friday, I completely nuked a GCP project IAM policy with Terraform and locked everything and everyone out!

Quite spectacular for a Friday morning, oh yeah I didn't mention this was on a Friday! :FacePalm

So how did I manage this?

I was testing some changes related to de-privileging an App Engine default service account, it automatically gets an editor role assigned which isn't great to have hanging around.

I'm currently using Terraform as the tool of choice for deploying infrastructure and Cloud Build for actually running the deployment.

I had tested this in a sandpit project, fairly new and just used a block of code similar to this:

resource "google_project_iam_policy" "project" {  project     = "your-project-id"  policy_data = data.google_iam_policy.admin.policy_data}data "google_iam_policy" "admin" {  binding {    role = "roles/viewer"    members = [      "serviceAccount:default_appengine@googleserviceaccount.com",    ]  }}

Now, the mistake I spotted after applying this was this set the IAM policy for the entire project, not just the member referenced. Again completely my fault for not correctly reading the docs and the very clearly stated warning :

You can accidentally lock yourself out of your project using this resource. Deleting a google_project_iam_policy removes access from anyone without organization-level access to the project. Proceed with caution. It's not recommended to use google_project_iam_policy with your provider project to avoid locking yourself out, and it should generally only be used with projects fully managed by Terraform. If you do use this resource, it is recommended to import the policy before applying the change.

I mentioned I tested this, didn't I ?!

I did, In a project which was pretty clean and the test worked and I still had access, so what gives?!

Luckily the org my sandpit project was in had some well-thought-out permissions set on the folder where my project lives, so inheritance preserved my IAM permissions on the project, but as it was a clean project I overlooked the missing Google services service accounts that had been removed.

I thought it looked good and proceeded to apply my changes for real!

The build started and then suddenly failed and access was lost, I thought that was very coincidental and then to my horror realised what I had done.

Oh crap!

Essentially I had removed all IAMs from the project and replaced the whole project IAM policy with just a single viewer role on a dedicated user-managed service account intended for App engine.

Reading back through the documentation, it made perfect sense, in previous experience I had only ever used google_project_iam_member Terraform resource. which is non-authoritative.

Own it. Get help

Essentially, I'm writing this to highlight that bad stuff happens to most people and it all depends on how you deal with it.

I reached out for help once I realised I was locked out and had made a pretty big derp.I got the help I needed and luckily got access back to the project that also luckily wasn't actually in production, I was preparing it for test use.

Sitting on it and stewing on it worrying about getting in trouble will never help the situation and remember, everyone has made mistakes before as long as no one dies and it wasn't intentional, most people will be understanding.

The team I'm working with had a bit of a laugh and it also made a good story where my other teammates told some of their war stories.

After I had access again, I then had to re-add the service accounts to the IAM service agent roles. Bit of a pain as was a lot of trial and error.

Some resources stopped working and took some troubleshooting to work out what was missing but I got there in the end.

But as that was the worst of it and took a couple of days to put what I'm hoping, most of it right again, I probably got away with it!

Deploying Terraform in a GCP Cloud Build Pipeline

Ferris Hall — Mon, 18 Jul 2022 21:35:16 GMT

If you followed my last post Getting started with Terraform on GCP, you've hopefully started deploying your GCP infrastructure using Terraform. Great start! If you haven't yet, I recommend checking out my post if you're new to IaC, DevOps, Terraform etc.

In this tutorial, we'll look at running our Terraform in a pipeline, specifically in GCP Cloud Build. Apologies this has been so delayed after my first part, I've been swamped and enjoying some nice weather. This tutorial is a little rough and ready, time was the priority here.

Disclaimer! This demo is intended to quickly get you using a pipeline to deploy Terraform and demonstrate the benefits, this method is not fit for production and should not be used for so!

I will write up a quick summary of how to improve this and where you can go afterwards.

What is a pipeline? And more so, why bother?!

You have probably heard the term pipeline and CI/CD.

A pipeline is usually part of the CI/CD process. CI/CD is the continuous integration of developer code and when that code has been developed and merged it is then deployed into the working environment. continuous building, integration, testing and deployment.

Running our Terraform in a pipeline makes it more transparent and collaborative. No more wondering who is pushing to what and running Terraform from their laptop. It also ensures the tasks running are running in a consistent environment, the same version of providers, Terraform etc.

We'll look at the different parts of what makes our pipeline and how we get it to run successfully.

Getting started

First, create a new project which you don't mind deleting afterwards. If you're unable to, just make a note of everything you add ad create so you can delete it when you're done to avoid any costs or security implications.

We'll need to make sure Cloud Build is enabled. Then we'll give the cloud build service account the editor IAM role for this demo purpose.

In real-world circumstances you'd need to be using least privilege IAM, using a dedicated service account to deploy Terraform and only the IAM roles that it needs (This is best practice, I repeat! In real-world projects use least privilege when assigning IAM roles!).

We'll need also to add the build yaml files that will tell Cloud Build what tasks we want to run. You can find it here.

GCP Build Trigger

Create a GCP Cloud Build Trigger. We'll need to connect Cloud Build to our GitHub repo so it can map and allow builds.

Then we'll configure our trigger to run our Terraform plan when we create a pull request to the main branch, the build config will point to the terrform-plan.yaml file we created earlier.

We'll also create one for running terraform apply, the difference in this trigger will be "Push to a branch" As we want this trigger to invoke when the pull request has been approved and merged into the main branch.

So, let's push to our feature branch. We can make an arbitrary change, like changing the name of the VM. Commit and push to our feature branch and create a Pull Request.

Cool! Our build has been triggered and the terraform-plan trigger is running, it will also appear as a check, if the check fails, if the Terraform plan fails for whatever reason, we won't be able to merge our "broken" code to the main branch. So adds a good safety net to our repo.

Our Terraform plan should pass with resources to add, we can now merge to main which will effectively push to main and trigger our terraform-apply Cloud Build trigger and apply our Terraform.

We can check the progress of the build from the Cloud Build log output from the Cloud Build dashboard, watch for any errors etc.

Build Success!

So that's that! You've just automated your Infrastructure as code! Congrats! It seems like a lot to do to effectively just run terraform apply, but hopefully, you'll start to see the benefits of deploying your infrastructure in a more automated way. So hopefully this has helped even just a bit.

Clean up

Don't forget to delete any service accounts, and service account keys, and remove any editor IAM roles if you're using a project you intend on keeping. Or to be safe just delete your project.

Ideas for improvement

So now you can see where a pipeline and automating your IaC deployments can be beneficial.

But there are a few security implications with this method and some security considerations you should make when taking this to a real-world and production-grade environment.

Use a dedicated service account for terraform, adding the Cloud Build default service account to the editor role is not a good idea, it's too broad an IAM role and everyone with the right role can use it.

You could also use that service account to trigger your builds instead of the default cloud Build service account.

There are some additional logging options that need to be added to the cloudbuild.yaml files. Or, you can use account impersonation as part of your cloudbuild.yaml. could build acting as your Terraform service account. More on account impersonation here.

Remote state. In this demonstration our state was not persistent, we let Cloud Build run terraform init locally, meaning we would have lost the terraform state file when the Cloud Build container environment ended.

We really should have created a GCS bucket and added a backend.tf file to tell terraform to store the state file in our remote GCS bucket. if we wanted to make any changes or destroy, we had no reference to state! When I get time I'll update this demo and the repo to add a backend.tf file.

For you more eagle-eyed readers, yes I did accidentally commit the name of a service account and project ID. They are long gone now and absolutely not a good idea to commit to a public repo!

Some pre-commit hooks could have probably saved me from the embarrassment, there are some really good ones for Terraform projects, Google search is your friend!

Any comments?! Please let me know! I'm always interested in hearing about different approaches to pipelines and deploying infrastructure. There really are so many different ways, approaches and opinions to this, all with different weighted pros and cons.

Please ping me if you have any questions or feedback! Good or bad!

Thanks for reading and following along!

Getting started with Terraform on GCP

Ferris Hall — Sun, 26 Jun 2022 22:38:06 GMT

This is the first part of potentially a few tutorials on how to get started with deploying infrastructure on Google Cloud Platform.

If you are new to Terraform it might be worth checking out my quick introduction to what Terraform is and why to use it.

Getting started

So to get started you'll need a GCP project, you can get started for free with (I still believe there is) a free tier and/or $300 credit to get going.

You'll also need to download and install the Terraform binary for your OS of choice.You'll also need to be used to working in the terminal and you'll need to create an empty working directory for your Terraform code.

In this example, we are going to be running as our own Google user identity but I'll explain in future tutorials the benefits of service accounts and impersonation.

Firstly we going to tell Terraform how to interact with our chosen platform, GCP.

In the root of our directory, we'll create providers.tfand add the following:

provider "google" {}

But we'll add some configuration for our project here too, just the project and region for now, it saves us typing it in on all our resources later.

provider "google" {  project = "my-terraform-gcp-project"  region = "europe-west2"}

Adding resources

Next, we need somewhere to declare what we want to create in GCP, these are referred to as resources. eg. A VPC network is a google cloud resource, a subnet in that VPC would be a separate resource and the VM that is attached to that subnet would be another resource etc. You get the picture.Here is an example of a VPC and a subnet in a main.tf file:

resource "google_compute_network" "custom-vpc" {  name                    = "test-tf-network"  auto_create_subnetworks = false}resource "google_compute_subnetwork" "subnet" {  name              = "test-tf-subnetwork"  ip_cidr_range = "10.2.0.0/16"  region             = "europe-west2"  network          = google_compute_network.custom-vpc.id}

That's it!

That's about 10 lines of code to create a VPC network and a subnet, it's pretty impressive and cool eh?!

Now, there are a lot more options and inputs we could add for an even more opinionated VPC and Subnet which is covered in Terraform's very handy and helpful available provider registry documentation via the Terraform registry when you really get going you'll spend a lot of time here!

Planning and applying

So we have our resources in our main.tf ready to go, we need to plan our additions and then apply when we're happy

To run our plan you need to enter the command terraform plan Terraform will then take a look at the resources in our Terraform files and it will also take a look at the state, now we haven't really covered state in much detail yet. Later!

This is the really cool part, we are declaratively telling Terraform what we want our infrastructure in GCP to look like.

In our case, we don't have anything in the state as this Terraform is all new so we should see a plan of 2 to add. The VPC network and the subnet. Think of terraform plan as a dry run or what would this current code add, change or remove if I applied.

After running a plan we can now run terraform apply it will run a plan once more but will ask us "do you want to perform these actions" type yes and enter and all going well, the resources in our terraform will be deployed in GCP!

Additions and changes

So that's our VPC and subnet running and configured in GCP, great! but what if we wanted to make changes or even add some more resources?Well, we can add something to our existing code, let's add a Compute instance to run on our network. This is where the state comes in. you might have noticed a new file appear in your directory terraform.tfstate this is the state file which represents our GCP infrastructure in JSON.

Let's add a web server VM and a firewall rule:

resource "google_compute_firewall" "web-fw" {  name          = "http-rule"  network      = google_compute_network.custom-vpc.id  description = "Creates firewall rule targeting tagged instances"  allow {    protocol = "tcp"    ports      = ["80"]  }  target_tags       = ["web"]  source_ranges = ["0.0.0.0/0"]}resource "google_compute_instance" "default" {  name               = "tf-test-web-vm"  machine_type = "g1-small"  zone                = "europe-west2-b"  tags                 = ["web"]  boot_disk {    initialize_params {      image = "debian-cloud/debian-11"    }  }  network_interface {    subnetwork = google_compute_subnetwork.subnet.self_link    access_config {      // Ephemeral public IP    }  }  metadata_startup_script = file("./startup.sh")  service_account {    scopes = ["cloud-platform"]  }}

When we run teraform plan again it will check the terraformstate.tf file for what already exists and notice the difference is the new compute instance resource and the firewall resource, it will then proceed to add the 2 new resources when we run a terraform apply

Making our code declarative! we are declaring to Terraform what we want our infrastructure to look like and how it's configured.

If we didn't add any new resources to our code and ran a plan or apply then Terraform would inform us that everything looks as it should according to the terraformstate.tf file.

Now if someone went into the console and decided to add another port to our firewall when we run apply again it would notice that it hasn't been declared in the terraform files, our main.tf in this example and would remove it leaving just the configured port 80. So it works by removing configuration and resources too.

Destroying

Lastly, to wrap up, let's get rid of any resources that you've created so you don't get charged for them.Running the command terraform destroy will offer if you are sure you want to destroy the resources as this cannot be undone.

That's it for this tutorial, I hope this has helped you get started with Terraform and explained its uses and demonstrated why it's pretty much the standard for deploying infrastructure at the moment and such a valuable skill to have experience with.

I have a repo which the code that I used for this tutorial which you can find here.

Please let me know if you spot any inconsistencies in my code, wording etc. I'm open to all feedback!

Next up I'm aiming to get a tutorial on running Terraform in a pipeline, this is where the really cool magic happens!

Infrastructure as Code Terraform 101

Ferris Hall — Sun, 26 Jun 2022 22:31:59 GMT

You may have deployed some cloud resources, a couple of VMs, a custom VPC, or maybe even a GKE cluster.

Now, you may have wondered "Why would should I bother with IaC, clicking through the console is easy and quick!" That may be true but try doing it for 10 VMs or a bunch of firewall rules for your VPC.

You'll soon figure out that it can be painfully slow and can introduce inconsistencies and errors. we're all human and/or have fat fingers like me.

My personal IaC of choice is Terraform, it's a declarative language expressed in HCL or JSON in which you configure your infrastructure to declare the state and what it should look like or how it should be configured.

So Why?

Deploying your infrastructure using IaC is repeatable, dependable and auditable. You can create modules so all your resources are deployed using the same code.Dependable, you can test and change your variables.Auditable, you can track changes using Git, even deployments using CICD pipelines.

Next up I'll be dropping some examples and tutorials to help anyone get started!

Getting started is part 1 of my tutorials. Enjoy!

GCP Cloud Composer issue

Ferris Hall — Fri, 24 Jun 2022 22:16:40 GMT

I came across a very odd and aggravating issue when developing and testing a Google Cloud Composer Terraform module today.

It's definitely a Google Composer issue, not a Terraform issue. When updating a Cloud Composer environment, which causes a GKE cluster to be recreated, it fails.

Resource nameprojects/$PROJECT_ID/locations/europe-west2/environments/test-composer-dev
Error messageFailed precondition (HTTP 400): Multiple errors occurred. Google Compute Engine: The subnetwork resource 'projects/$PROJECT_ID/regions/europe-west2/subnetworks/test' is already being used by 'projects/$PROJECT_ID/regions/europe-west2/nats/nat-rtr-Nat'. Could not configure workload identity because of another error Could not delete inverting proxy assignment because of another error

This is a private composer environment so I'm using Cloud NAT to allow egrees to the internet.It seems that Cloud NAT is using the subnet primary and secondary ranges that Cloud Composer creates for the GKE cluster, which then stops it from being able to update or destroy the environment, a race condition I guess.

To get around this I had to delete the Cloud NAT resource and then proceed with the change and/or deleting of the environment. Essentially freeing up cluster resources from the Cloud NAT resource that was attached to the subnet and IP ranges. Frustrating to say the least.

I don't have any experience with using or spinning up Cloud Composer before, from what I have read there are quite a few layers and resources which can cause clashes or issues I guess, there seem to be some "known issues" with composer.

Thought I would note this down, would be interesting to see or hear if anyone else has this issue or similar.

Another cloud infrastructure blog!

Ferris Hall — Thu, 23 Jun 2022 19:43:37 GMT

Good evening everyone!

I've decided to start blogging again (I blogged a good few years ago).I'll be posting, most likely infrequently so apologies in advance!

I intend to post articles on new tech and tools I'm tinkering with, any cool or interesting work I have done and some how-to articles.

Firstly, a bit about me.

Im a Google Cloud certified Platform Engineer and a Google authorized trainer at Appsbroker, Googles largest Premier Partner in Europe.

Providing infrastructure expertise via design and deployment on very interesting projects. Designing and building automated processes to ensure consistent, repeatable deployments of cloud infrastructure.

Prior to my current role, I was a site reliability engineer at another cloud consultancy and before that, I was a Linux infrastructure sysadmin at a startup.This is where I first got into using DevOps tools, and cloud technology and generally really started getting into and enjoying automation.

I'm genuinely passionate about all things IT enterprise infrastructure, DevOps, automation, learning, developing and most recently delivering training.

My recent found status as a GCP trainer has sparked this blog and my wanting to write about tech, DevOps, GCP etc.