GCP Folders, Projects and Networks
To host our GKE clusters, we’ll use several GCP projects. Projects on GCP provide boundaries around access privileges (Identity and Access Management), network constructs like subnets and similar and act as a collection of resources. Our multi-cluster environment is going to need a few projects and some organisation of the base resources in them.
This is part of my Multi-Cluster GKE series. Check the other posts for more informationDesign
Goals
- At least one project per tier
- Suitable for private clusters
Project Layout
It’s entirely possible to just create projects in the root of your organisation. You can survive for a long time with that pattern, though once you get past a few projects though it’s useful to have some organisation. Also if you’re like me, projects and folders allow you to easily clean up after you POC (proof of concept) something out. Delete a project and all the resources in it go away, no more forgotten S3 Bucket somewhere billing you 18 cents a month.
The projects and folders we’re creating in this series are more complex than required so we can demonstrate features and capabilities.
.
├── Core-Infrastructure
│ └── Core-project
└── Worker-Infrastructure
├── Staging
│ └── Staging-project
└── Production
└── Prod-project
Network Design
When GKE was first released, it required a public IP on all nodes, this is certainly still an option, but there are valid concerns about leaving your nodes exposed on the internet. Additionally, Google is going to charge for those IPs shortly.
To support private clusters, we have several network design options.
Islands of Isolation:
All projects have a network, and traffic is routed externally to exposed load-balancers and ingresses. This pattern could work very well for organisations that focus on strict authentication, authorisation and encryption.
Peers:
The 2 workload projects could peer their network with the core project allowing internal access while keeping management of each tier separate. This pattern allows significant control over how projects communicate with each other while still leaving local control of internal network configuration.
Single Shared Network:
One project shares its network with the workload projects, and this limits the control over the network while still allowing access. It also allows a smaller operational footprint as you can share various network services like Cloud NAT.
These 3 options are all valid in different situations. In this series, I’ve chosen the single network option. You constraints may lead you to pick any of or even a combination of these network options. You may have workloads that need to run in a completely isolated island while other teams operate with peered connections. Regardless most of the Kubernetes configuration support any of these options.
To configure this network we’re going to use an additional project as our host VPC project. This network project owns the network, subnetwork[^subnetwork] configurations and NAT services to allow access outbound to the internet. Through IAM (Identity and Account Management), we allow container service workloads access to the networks as required.
Core Module and our First Variables
Before we can build our projects, we need to create several GCP folders. To do that we need to create some files. These files are in our core module, which encapsulates the set up of all of our infrastructure. If you want to know more about modules, I’d recommend heading off to the terraform module documentation.
cd modules/core
touch main.tf
touch variables.tf
Let’s start with our input variables. They define the values that our core module uses to create our resources. Terraform requires any variable that doesn’t have a default to be defined when the module is declared. It makes life easier for your consumers to keep them all in one place. As we work through this series, we’ll add more.
modules/core/variables.tf
variable "folder" {
description = "The base folder to create this infrastructure in"
type = string
}
variable "name" {
description = "The name of our collection of resources and prefix for some resources"
type = string
}
variable "billing-account" {
description = "Which account will pay for these services"
type = string
}
These variables feed into the creation of the resources in our `main.tf.
Core module
I won’t be covering everything in every file so please check the repository if you want additional context.
The first of the google_folder resources creates a folder for everything else to live. Then we create folders for our core, staging and production workloads.
The final 2 blocks (reminder this is an excerpt check the companion repository for the full information) are a module that allows us to configure all the projects in a single consistent way. Modules are a useful pattern in Terraform with some limitations. They currently don’t support the count or for_each iterators which is mildly annoying.
The module "core-project"
and module "prod-project"
blocks are 2 different calls to the module found in the ../common/projects
folder. Terraform requires them to be given different names to be known by so we can build them. Each module has the various input variables set in them. These look the same as the above but aren’t they are defined inside the module source. We’ll go into the creation of that module now.
modules/core/main.tf
resource "google_folder" "base_folder" {
display_name = var.name
parent = var.base-folder
}
resource "google_folder" "core_folder" {
display_name = "control-plane"
parent = google_folder.base_folder.name
}
module "core-project" {
source = "../common/projects"
name = "${var.name}-core"
folder = google_folder.core_folder.name
billing-account = var.billing-account
}
module "prod-project" {
source = "../common/projects"
name = "${var.name}-prod"
folder = google_folder.worker_tiers["production"].name
billing-account = var.billing-account
}
Now our new projects module is going to live in modules/common/projects
so create that. We’re going to create 2 files with names that you can probably guess by now. main.tf
and variables.tf
.
The variables file are the same as above, so we’re going to skip over that. Our main.tf
contains more interesting1 resources.
In our project module, we have 2 resources. The first resource is the project itself. We pass in the variables from the declaration and set 2 other useful fields. skip_delete
is set to prevent the project from being deleted when we click destroy. If you’re working with GCP, I highly recommend this as project deletions take a long long2 time.
modules/common/projects/main.tf
resource "google_project" "cluster" {
name = var.name
project_id = var.name
folder_id = var.folder
billing_account = var.billing-account
skip_delete = true
auto_create_network = false
}
We also need to disable the automatic creation of the project network as we manage that separately as most projects won’t have a network.
Google projects start with almost all services disabled, and this helps prevent accidents. We’ll need to enable the services we want to use. Here we create a local block with a list of all the services we want enabled. This local block is used in our enabled-apis
resource with a for_each
to loop through them. As new services are required, they can be added to the list and enabled globally.
locals{
services = [
"compute.googleapis.com",
"container.googleapis.com",
]
}
resource "google_project_service" "enabled-apis" {
for_each = toset(local.services)
service = each.value
project = google_project.cluster.project_id
disable_dependent_services=true
}
If required, you could use the merge functions of Terraform to have a default list with custom services as required.
Outputs allow us to pass information up through our declarations. Here we pass each project back out so our GKE module can use it.
output project {
value = google_project.cluster
}
Now we have projects we still need a network for everything to run in.
Network Creation
Our network is going to run in a special-purpose project to allow restrictions.
modules/core/main.tf
But now, in our core module, we need to create the host project. Using our project module, we create our host-project
and a network defined in it.
module "host-project" {
source = "../common/projects"
name = "${var.name}-host-net"
folder = google_folder.core_folder.name
billing-account = var.billing-account
}
resource "google_compute_network" "host-vpc" {
name = var.name
project = module.host-project.project.project_id
routing_mode = "GLOBAL"
auto_create_subnetworks = false
}
With the output from our shared module, we create a host network. This network gets shared out to our other projects and where the subnets get created. To allow it to be shared, we use the google_compute_shared_vpc_host_project
attribute to enable it.
resource "google_compute_shared_vpc_host_project" "host-vpc" {
project = module.host-project.project.project_id
}
You now have a project that can share out the networks configured in it and a network. But a GCP network is only a container, and not somewhere we can put VMs or clusters. So we need to create some subnetworks3 with secondary ranges4, Configure permissions and more. We need an implementation of these resources for each cluster we define. We’re going to create an object that get stored in a map to collect the values together.
Create the modules/common/gke
folder and the traditional main.tf
and variables.tf
files in there. Our GKE module needs the name
and name of the project
for the resources. It also uses the host-project
and host-network-name
to create the subnetworks.
modules/common/gke/variables.tf
variable "name" {
type = string
}
variable "project" {
type = string
}
variable "host-project" {
type = string
}
variable "host-network-name" {
type = string
}
We then have the cluster map. Maps are collections of objects with a key for each. This map of objects allow us to define several clusters and configure them all in one pass.
variable "clusters" {
type = map(object({
region = string
ip_range = string
}))
}
The objects in our map include the ip_range, and region for a cluster. Allowing us to create all the resources required in our main.tf
file.
modules/common/gke/main.tf
In this file, we start with a data resource doing a lookup on our host-vpc
network. We need this so we can get the self_link
. This self_link
is a unique path for each network and used to configure the subnetwork. For our subnetworks, we first configure which network with the self_link
. We then set up our boilerplate. The for_each
loop, a name
which uses string interpolation to build the subnet name, the project
and region
this subnet runs in.
data "google_compute_network" "host-vpc" {
name = var.host-network-name
project = var.host-project
}
resource "google_compute_subnetwork" "subnets" {
network = data.google_compute_network.host-vpc.self_link
for_each = var.clusters
name = "${var.name}-${each.key}"
project = var.host-project
region = each.value.region
The region is on our cluster map accessed through the each object.
Now to configure our IP addresses. As we want to run our clusters in their vpc-native
mode, we need to ensure they have at least 3 IP ranges available. But our clusters object only has a single ip_range
field. We could expand our cluster object to include both a pod and service range object with the additional configuration load required to manage that. Or we can use the cidrsubnet
function to split the incoming range. cidrsubnet
splits the subnet and returns a range.
In this case we effectively split the range into 4 quarters the ip_range
. Quarters 1 and 2 are combined and reserved for the pod range. The third quarter of the ip_range
is the main network range and used by the nodes. The final quarter reserved for Kubernetes services.
The ip_cidr_range
attribute configures the node range. We then have the secondary_ip_range
list containing both the pods
and services
ranges.
ip_cidr_range = cidrsubnet(cidrsubnet(each.value.ip_range, 1, 1), 1, 0)
secondary_ip_range = [
{
range_name = "${each.key}-pods"
ip_cidr_range = cidrsubnet(each.value.ip_range, 1, 0)
},
{
range_name = "${each.key}-services"
ip_cidr_range = cidrsubnet(cidrsubnet(each.value.ip_range, 1, 1), 1, 1)
}
]
}
We now link the service project with our host project. Then create the subnetwork bindings.
resource "google_compute_shared_vpc_service_project" "clusters" {
host_project = var.host-project
service_project = var.project
}
data "google_project" "project" {
project_id = var.project
}
We need to retrieve the numeric ID of our project with our data object. Then using this, we can then loop over our subnets and assign the correct permissions to each subnet.
resource "google_compute_subnetwork_iam_member" "compute" {
for_each = google_compute_subnetwork.subnets
project = each.value.project
region = each.value.region
subnetwork = each.value.name
role = "roles/compute.networkUser"
member = "serviceAccount:${data.google_project.project.number}@cloudservices.gserviceaccount.com"
}
resource "google_compute_subnetwork_iam_member" "container" {
for_each = google_compute_subnetwork.subnets
project = each.value.project
region = each.value.region
subnetwork = each.value.name
role = "roles/compute.networkUser"
member = "serviceAccount:service-${data.google_project.project.number}@container-engine-robot.iam.gserviceaccount.com"
}
Making Something Happen
It’s time to start using all of this. First, we need to configure our core module to start using our GKE module.
modules/core/main.tf
The excerpt below shows us calling the gke module to create below consume our gke module to produce the core and prod tiers of clusters. These module declarations consume variables.
module "core-clusters" {
source = "../common/gke"
name = "${var.name}-core"
clusters = var.core-clusters
project = module.core-project.project.project_id
host-project = module.host-project.project.project_id
host-network-name = google_compute_network.host-vpc.name
}
module "prod-clusters" {
source = "../common/gke"
name = "${var.name}-prod"
clusters = var.prod-clusters
project = module.prod-project.project.project_id
host-project = module.host-project.project.project_id
host-network-name = google_compute_network.host-vpc.name
}
We need to now declare this core module in our repository root main.tf
and then kick everything off.
main.tf
module compute {
source = "./modules/core"
name = "kca-spin"
billing-account = data.google_billing_account.bills.id
base-folder = var.base_folder
core-clusters = {
sydney = {
region = "australia-southeast1"
ip_range = "172.16.0.0/16"
} }
staging-clusters = {
sydney = {
region = "australia-southeast1"
ip_range = "172.17.0.0/16"
}
}
prod-clusters = {
central = {
region = "us-central1"
ip_range = "172.18.0.0/16"
}
sydney = {
region = "australia-southeast1"
ip_range = "172.19.0.0/16"
}
}
}
We give it the wonderfully imaginative name of compute, tell Terraform where to find the code in question and then get onto the parts we have created.
Some boilerplate, something to pay the bill and our clusters. Core, staging and prod. Each one declared using our new map of objects pattern. In this case, we create one core cluster in Sydney, one staging cluster also in Sydney and 2 prod clusters. One in Sydney and one in the us-central region. One important note is that the IP ranges we specify here need to be globally unique so that we can route between networks as required
Now with a terraform init
and a terraform apply
you should have a whole bunch of resources created and ready for us to put Kubernetes clusters in soon.
Check the repo for where we’re up to on this posts branch
This series is not yet complete and updates are coming.Subscribe to RSS or follow @kcollasarundell on twitter.
References:
OK. my idea of what is interesting is more than a bit odd. ↩︎
looooooooooooooooooooooooooooooooooooooooooong. It’s longer than you think. ↩︎
GCP documentation appears to use both subnet and subnetwork interchangeably in parts ↩︎
GKE has two network modes. Routed networks which create static routes for each node and VPC-native. VPC-native provides performance improvements for clusters mode with service IPs and Pod IPs defined in secondary ranges in the subnetwork. As these ranges must be unique in the network, we need to make sure we define them manually. The size of these networks determines the max size of your clusters. So consideration must be taken choosing an
ip_range
to run in. ↩︎