capi capi

Kubernetes Cluster API Provider GCP

Kubernetes-native declarative infrastructure for GCP.

What is the Cluster API Provider GCP?

The Cluster API brings declarative Kubernetes-style APIs to cluster creation, configuration and management. The API itself is shared across multiple cloud providers allowing for true Google Cloud hybrid deployments of Kubernetes.

Documentation

Please see our book for in-depth documentation.

Quick Start

Checkout our Cluster API Quick Start to create your first Kubernetes cluster on Google Cloud Platform using Cluster API.

Support Policy

This provider’s versions are compatible with the following versions of Cluster API:

	Cluster API `v1alpha3` (`v0.3.x`)	Cluster API `v1alpha4` (`v0.4.x`)	Cluster API `v1beta1` (`v1.0.x`)
Google Cloud Provider `v0.3.x`	✓
Google Cloud Provider `v0.4.x`		✓
Google Cloud Provider `v1.0.x`			✓

This provider’s versions are able to install and manage the following versions of Kubernetes:

	Google Cloud Provider `v0.3.x`	Google Cloud Provider `v0.4.x`	Google Cloud Provider `v1.0.x`
Kubernetes 1.15
Kubernetes 1.16	✓
Kubernetes 1.17	✓	✓
Kubernetes 1.18	✓	✓	✓
Kubernetes 1.19	✓	✓	✓
Kubernetes 1.20	✓	✓	✓
Kubernetes 1.21		✓	✓
Kubernetes 1.22			✓

Each version of Cluster API for Google Cloud will attempt to support at least two versions of Kubernetes e.g., Cluster API for GCP v0.1 may support Kubernetes 1.13 and Kubernetes 1.14.

NOTE: As the versioning for this project is tied to the versioning of Cluster API, future modifications to this policy may be made to more closely align with other providers in the Cluster API ecosystem.

Getting Involved and Contributing

Are you interested in contributing to cluster-api-provider-gcp? We, the maintainers and the community would love your suggestions, support and contributions! The maintainers of the project can be contacted anytime to learn about how to get involved.

Before starting with the contribution, please go through prerequisites of the project.

To set up the development environment, checkout the development guide.

In the interest of getting new people involved, we have issues marked as good first issue. Although these issues have a smaller scope but are very helpful in getting acquainted with the codebase. For more, see the issue tracker. If you’re unsure where to start, feel free to reach out to discuss.

See also: Our own contributor guide and the Kubernetes community page.

We also encourage ALL active community participants to act as if they are maintainers, even if you don’t have ‘official’ written permissions. This is a community effort and we are here to serve the Kubernetes community. If you have an active interest and you want to get involved, you have real power!

Office hours

Join the SIG Cluster Lifecycle Google Group for access to documents and calendars.
Participate in the conversations on Kubernetes Discuss
Provider implementers office hours (CAPI)
- Weekly on Wednesdays @ 10:00 am PT (Pacific Time) on Zoom
- Previous meetings: [ notes | recordings ]
Cluster API Provider GCP office hours (CAPG)
- Monthly on first Thursday @ 09:00 am PT (Pacific Time) on Zoom
- Previous meetings: [ notes|recordings ]

Other ways to communicate with the contributors

Please check in with us in the #cluster-api-gcp on Slack.

Github Issues

Bugs

If you think you have found a bug, please follow the instruction below.

Please give a small amount of time giving due diligence to the issue tracker. Your issue might be a duplicate.
Get the logs from the custom controllers and please paste them in the issue.
Open a bug report.
Remember users might be searching for the issue in the future, so please make sure to give it a meaningful title to help others.
Feel free to reach out to the community on slack.

Tracking new feature

We also have an issue tracker to track features. If you think you have a feature idea, that could make Cluster API provider GCP become even more awesome, then follow these steps.

Open a feature request.
Remember users might be searching for the issue in the future, so please make sure to give it a meaningful title to help others.
Clearly define the use case with concrete examples. Example: type this and cluster-api-provider-gcp does that.
Some of our larger features will require some design. If you would like to include a technical design in your feature, please go ahead.
After the new feature is well understood and the design is agreed upon, we can start coding the feature. We would love for you to code it. So please open up a WIP (work in progress) PR and happy coding!

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

Getting started with CAPG

In this section we’ll cover the basics of how to prepare your environment to use Cluster API Provider for GCP.

Before installing CAPG, your Kubernetes cluster has to be transformed into a CAPI management cluster. If you have already done this, you can jump directly to the next section: Installing CAPG. If, on the other hand, you have an existing Kubernetes cluster that is not yet configured as a CAPI management cluster, you can follow the guide from the CAPI book.

Requirements

Linux or MacOS (Windows isn’t supported at the moment).
A Google Cloud account.
Packer and Ansible to build images
make to use Makefile targets
Install coreutils (for timeout) on OSX

Create a Service Account

To create and manage clusters, this infrastructure provider uses a service account to authenticate with GCP’s APIs.

From your cloud console, follow these instructions to create a new service account with Editor permissions.

If you plan to use GKE the service account will also need the iam.serviceAccountTokenCreator role.

Afterwards, generate a JSON Key and store it somewhere safe.

Installing CAPG

There are two major provider installation paths: using clusterctl or the Cluster API Operator.

clusterctl is a command line tool that provides a simple way of interacting with CAPI and is usually the preferred alternative for those who are getting started. It automates fetching the YAML files defining provider components and installing them.

The Cluster API Operator is a Kubernetes Operator built on top of clusterctl and designed to empower cluster administrators to handle the lifecycle of Cluster API providers within a management cluster using a declarative approach. It aims to improve user experience in deploying and managing Cluster API, making it easier to handle day-to-day tasks and automate workflows with GitOps. Visit the CAPI Operator quickstart if you want to experiment with this tool.

You can opt for the tool that works best for you or explore both and decide which is best suited for your use case.

clusterctl

The Service Account you created will be used to interact with GCP and it must be base64 encoded and stored in a environment variable before installing the provider via clusterctl.

export GCP_B64ENCODED_CREDENTIALS=$( cat /path/to/gcp-credentials.json | base64 | tr -d '\n' )

Finally, let’s initialize the provider.

clusterctl init --infrastructure gcp

This process may take some time and, once the provider is running, you’ll be able to see the capg-controller-manager pod in your CAPI management cluster.

Cluster API Operator

You can refer to the Cluster API Operator book here to learn about the basics of the project and how to install the operator.

When using Cluster API Operator, secrets are used to store credentials for cloud providers and not environment variables, which means you’ll have to create a new secret containing the base64 encoded version of your GCP credentials and it will be referenced in the yaml file used to initialize the provider. As you can see, by using Cluster API Operator, we’re able to manage provider installation declaratively.

Create GCP credentials secret.

export CREDENTIALS_SECRET_NAME="gcp-credentials"
export CREDENTIALS_SECRET_NAMESPACE="default"
export GCP_B64ENCODED_CREDENTIALS=$( cat /path/to/gcp-credentials.json | base64 | tr -d '\n' )

kubectl create secret generic "${CREDENTIALS_SECRET_NAME}" --from-literal=GCP_B64ENCODED_CREDENTIALS="${GCP_B64ENCODED_CREDENTIALS}" --namespace "${CREDENTIALS_SECRET_NAMESPACE}"

Define CAPG provider declaratively in a file capg.yaml.

apiVersion: v1
kind: Namespace
metadata:
  name: capg-system
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
 name: gcp
 namespace: capg-system
spec:
 version: v1.8.0
 configSecret:
   name: gcp-credentials

After applying this file, Cluster API Operator will take care of installing CAPG using the set of credentials stored in the specified secret.

kubectl apply -f capg.yaml

Prerequisites

Before provisioning clusters via CAPG, there are a few extra tasks you need to take care of, including configuring the GCP network and building images for GCP virtual machines.

Set environment variables

export GCP_REGION="<GCP_REGION>"
export GCP_PROJECT="<GCP_PROJECT>"
# Make sure to use same kubernetes version here as building the GCE image
export KUBERNETES_VERSION=1.22.3
export GCP_CONTROL_PLANE_MACHINE_TYPE=n1-standard-2
export GCP_NODE_MACHINE_TYPE=n1-standard-2
export GCP_NETWORK_NAME=<GCP_NETWORK_NAME or default>
export CLUSTER_NAME="<CLUSTER_NAME>"

Configure Network and Cloud NAT

Google Cloud accounts come with a default network which can be found under VPC Networks. If you prefer to create a new Network, follow these instructions.

Cloud NAT

This infrastructure provider sets up Kubernetes clusters using a Global Load Balancer with a public ip address.

Kubernetes nodes, to communicate with the control plane, pull container images from registered (e.g. gcr.io or dockerhub) need to have NAT access or a public ip. By default, the provider creates Machines without a public IP.

To make sure your cluster can communicate with the outside world, and the load balancer, you can create a Cloud NAT in the region you’d like your Kubernetes cluster to live in by following these instructions.

NB: The following commands needs to be run if ${GCP_NETWORK_NAME} is set to default

# Ensure if network list contains default network
gcloud compute networks list --project="${GCP_PROJECT}"

gcloud compute networks describe "${GCP_NETWORK_NAME}" --project="${GCP_PROJECT}"

# Ensure if firewall rules are enabled
$ gcloud compute firewall-rules list --project "$GCP_PROJECT"

# Create routers
gcloud compute routers create "${CLUSTER_NAME}-myrouter" --project="${GCP_PROJECT}" --region="${GCP_REGION}" --network="default"

# Create NAT
gcloud compute routers nats create "${CLUSTER_NAME}-mynat" --project="${GCP_PROJECT}" --router-region="${GCP_REGION}" --router="${CLUSTER_NAME}-myrouter"
--nat-all-subnet-ip-ranges --auto-allocate-nat-external-ips

Building images

NB: The following commands should not be run as root user.

# Export the GCP project id you want to build images in.
export GCP_PROJECT_ID=<project-id>

# Export the path to the service account credentials created in the step above.
export GOOGLE_APPLICATION_CREDENTIALS=</path/to/serviceaccount-key.json>

# Clone the image builder repository if you haven't already.
git clone https://github.com/kubernetes-sigs/image-builder.git image-builder

# Change directory to images/capi within the image builder repository
cd image-builder/images/capi

# Run the Make target to generate GCE images.
make build-gce-ubuntu-2004

# Check that you can see the published images.
gcloud compute images list --project ${GCP_PROJECT_ID} --no-standard-images --filter="family:capi-ubuntu-2004-k8s"

# Export the IMAGE_ID from the above
export IMAGE_ID="projects/${GCP_PROJECT_ID}/global/images/<image-name>"

Clean-up

Delete the NAT gateway

gcloud compute routers nats delete "${CLUSTER_NAME}-mynat" --project="${GCP_PROJECT}" \
--router-region="${GCP_REGION}" --router="${CLUSTER_NAME}-myrouter" --quiet || true

Delete the router

gcloud compute routers delete "${CLUSTER_NAME}-myrouter" --project="${GCP_PROJECT}" \
--region="${GCP_REGION}" --quiet || true

Self-managed clusters

This section contains information about how you can provision self-managed Kubernetes clusters hosted in GCP’s Compute Engine.

Provisioning a self-managed Cluster

This guide uses an example from the ./templates folder of the CAPG repository. You can inspect the yaml file here.

Configure cluster parameters

While inspecting the cluster definition in ./templates/cluster-template.yaml you probably noticed that it contains a number of parameterized values that must be substituted with the specifics of your use case. This can be done via environment variables and clusterctl and effectively makes the template more flexible to adapt to different provisioning scenarios. These are the environment variables that you’ll be required to set before deploying a workload cluster:

export GCP_REGION=us-east4
export GCP_PROJECT=cluster-api-gcp-project
export CONTROL_PLANE_MACHINE_COUNT=1
export WORKER_MACHINE_COUNT=1
export KUBERNETES_VERSION=1.29.3
export GCP_CONTROL_PLANE_MACHINE_TYPE=n1-standard-2
export GCP_NODE_MACHINE_TYPE=n1-standard-2
export GCP_NETWORK_NAME=default
export IMAGE_ID=projects/cluster-api-gcp-project/global/images/your-image

Generate cluster definition

The sample cluster templates are already prepared so that you can use them with clusterctl to create a self-managed Kubernetes cluster with CAPG.

clusterctl generate cluster capi-gcp-quickstart -i gcp > capi-gcp-quickstart.yaml

In this example, capi-gcp-quickstart will be used as cluster name.

Create cluster

The resulting file represents the workload cluster definition and you simply need to apply it to your cluster to trigger cluster creation:

kubectl apply -f capi-gcp-quickstart.yaml

Kubeconfig

When creating an GCP cluster 2 kubeconfigs are generated and stored as secrets in the management cluster.

User kubeconfig

This should be used by users that want to connect to the newly created GCP cluster. The name of the secret that contains the kubeconfig will be [cluster-name]-user-kubeconfig where you need to replace [cluster-name] with the name of your cluster. The -user-kubeconfig in the name indicates that the kubeconfig is for the user use.

To get the user kubeconfig for a cluster named managed-test you can run a command similar to:

kubectl --namespace=default get secret managed-test-user-kubeconfig \
   -o jsonpath={.data.value} | base64 --decode \
   > managed-test.kubeconfig

Cluster API (CAPI) kubeconfig

This kubeconfig is used internally by CAPI and shouldn’t be used outside of the management server. It is used by CAPI to perform operations, such as draining a node. The name of the secret that contains the kubeconfig will be [cluster-name]-kubeconfig where you need to replace [cluster-name] with the name of your cluster. Note that there is NO -user in the name.

The kubeconfig is regenerated every sync-period as the token that is embedded in the kubeconfig is only valid for a short period of time.

CNI

By default, no CNI plugin is installed when a self-managed cluster is provisioned. As a user, you need to install your own CNI (e.g. Calico with VXLAN) for the control plane of the cluster to become ready.

This document describes how to use Flannel as your CNI solution.

Modify the Cluster resources

Before deploying the cluster, change the KubeadmControlPlane value at spec.kubeadmConfigSpec.clusterConfiguration.controllerManager.extraArgs.allocate-node-cidrs to "true"

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
spec:
  kubeadmConfigSpec:
    clusterConfiguration:
      controllerManager:
        extraArgs:
          allocate-node-cidrs: "true"

Modify Flannel Config

(NOTE): This is based off of the instruction at: deploying-flannel-manually

You need to make an adjustment to the default flannel configuration so that the CIDR inside your CAPG cluster matches the Flannel Network CIDR.

View your capi-cluster.yaml and make note of the Cluster Network CIDR Block. For example:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16

Download the file at https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml and modify the kube-flannel-cfg ConfigMap. Set the value at data.net-conf.json.Network value to match your Cluster Network CIDR Block.

wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Edit kube-flannel.yml and change this section so that the Network section matches your Cluster CIDR

kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
data:
  net-conf.json: |
    {
      "Network": "192.168.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }

Apply kube-flannel.yml

kubectl apply -f kube-flannel.yml

GKE Support in the GCP Provider

Feature status: Experimental
Feature gate (required): GKE=true

Overview

The GCP provider supports creating GKE based cluster. Currently the following features are supported:

Provisioning/managing a GCP GKE Cluster
Upgrading the Kubernetes version of the GKE Cluster
Creating a managed node pool and attaching it to the GKE cluster

The implementation introduces the following CRD kinds:

GCPManagedCluster - presents the properties needed to provision and manage the general GCP operating infrastructure for the cluster (i.e project, networking, iam)
GCPManagedControlPlane - specifies the GKE Cluster in GCP and used by the Cluster API GCP Managed Control plane
GCPManagedMachinePool - defines the managed node pool for the cluster

And a new template is available in the templates folder for creating a managed workload cluster.

Provisioning a GKE cluster

This guide uses an example from the ./templates folder of the CAPG repository. You can inspect the yaml file here.

Configure cluster parameters

While inspecting the cluster definition in ./templates/cluster-template-gke.yaml you probably noticed that it contains a number of parameterized values that must be substituted with the specifics of your use case. This can be done via environment variables and clusterctl and effectively makes the template more flexible to adapt to different provisioning scenarios. These are the environment variables that you’ll be required to set before deploying a workload cluster:

export GCP_PROJECT=cluster-api-gcp-project
export GCP_REGION=us-east4
export GCP_NETWORK_NAME=default
export WORKER_MACHINE_COUNT=1

Generate cluster definition

The sample cluster templates are already prepared so that you can use them with clusterctl to create a GKE cluster with CAPG.

To create a GKE cluster with a managed node group (a.k.a managed machine pool):

clusterctl generate cluster capi-gke-quickstart --flavor gke -i gcp > capi-gke-quickstart.yaml

In this example, capi-gke-quickstart will be used as cluster name.

Create cluster

The resulting file represents the workload cluster definition and you simply need to apply it to your cluster to trigger cluster creation:

kubectl apply -f capi-gke-quickstart.yaml

Kubeconfig

When creating an GKE cluster 2 kubeconfigs are generated and stored as secrets in the management cluster.

User kubeconfig

This should be used by users that want to connect to the newly created GKE cluster. The name of the secret that contains the kubeconfig will be [cluster-name]-user-kubeconfig where you need to replace [cluster-name] with the name of your cluster. The -user-kubeconfig in the name indicates that the kubeconfig is for the user use.

To get the user kubeconfig for a cluster named managed-test you can run a command similar to:

kubectl --namespace=default get secret managed-test-user-kubeconfig \
   -o jsonpath={.data.value} | base64 --decode \
   > managed-test.kubeconfig

Cluster API (CAPI) kubeconfig

The kubeconfig is regenerated every sync-period as the token that is embedded in the kubeconfig is only valid for a short period of time.

GKE Cluster Upgrades

Control Plane Upgrade

Upgrading the Kubernetes version of the control plane is supported by the provider. To perform an upgrade you need to update the controlPlaneVersion in the spec of the GCPManagedControlPlane. Once the version has changed the provider will handle the upgrade for you.

Enabling GKE Support

Enabling GKE support is done via the GKE feature flag by setting it to true. This can be done before running clusterctl init by using the EXP_CAPG_GKE environment variable:

export EXP_CAPG_GKE=true
clusterctl init --infrastructure gcp

IMPORTANT: To use GKE the service account used for CAPG will need the iam.serviceAccountTokenCreator role assigned.

Disabling GKE Support

Support for GKE is disabled by default when you use the GCP infrastructure provider.

ClusterClass

Feature status: Experimental
Feature gate: ClusterTopology=true

ClusterClass is a collection of templates that define a topology (control plane and machine deployments) to be used to continuously reconcile one or more Clusters. It is built on top of the existing Cluster API resources and provides a set of tools and operations to streamline cluster lifecycle management while maintaining the same underlying API.

CAPG supports the creation of clusters via Cluster Topology for self-managed clusters only.

Provisioning a Cluster via ClusterClass

This guide uses an example from the ./templates folder of the CAPG repository. You can inspect the yaml file for the ClusterClass here and the cluster definition here.

Templates and clusters

ClusterClass makes cluster templates more flexible and versatile as it allows users to create cluster flavors that can be reused for cluster provisioning.

In this case, while inspecting the sample files, you probably noticed that there are references to two different yaml:

./templates/cluster-template-clusterclass.yaml is the class definition. It represents the template that define a topology: control plane and machine deployment but it won’t provision the cluster.
./templates/cluster-template-topology.yaml is the cluster definition that references the class. This workload cluster definition is considerably simpler than a regular CAPI cluster template that does not use ClusterClass, as most of the complexity of defining the control plane and machine deployment has been removed by the class.

Configure ClusterClass

While inspecting the templates you probably noticed that they contain a number of parameterized values that must be substituted with the specifics of your use case. This can be done via environment variables and clusterctl and effectively make the templates more flexible to adapt to different provisioning scenarios. These are the environment variables that you’ll be required to set before deploying a class and a workload cluster from it:

export CLUSTER_CLASS_NAME=sample-cc
export GCP_PROJECT=cluster-api-gcp-project
export GCP_REGION=us-east4
export GCP_NETWORK_NAME=default
export IMAGE_ID=projects/cluster-api-gcp-project/global/images/your-image

Generate ClusterClass definition

The sample ClusterClass template is already prepared so that you can use it with clusterctl to create a CAPI ClusterClass with CAPG.

clusterctl generate cluster capi-gcp-quickstart-clusterclass --flavor clusterclass -i gcp > capi-gcp-quickstart-clusterclass.yaml

In this example, capi-gcp-quickstart-clusterclass will be used as class name.

Create ClusterClass

The resulting file represents the class template definition and you simply need to apply it to your cluster to make it available in the API:

kubectl apply -f capi-gcp-quickstart-clusterclass.yaml

Create a cluster from a class

ClusterClass is a powerful feature of CAPI because we can now create one or multiple clusters that are based on the same class that is available in the CAPI Management Cluster. This base template can be parameterized so clusters created from it can make slight changes to the original configuration and adapt to the specifics of the use case, e.g. provisioning clusters for different development, staging and production environments.

Now that the class is available to be referenced by cluster objects, let’s configure the workload cluster and provision it.

export CLUSTER_NAME=sample-cluster
export CLUSTER_CLASS_NAME=sample-cc
export KUBERNETES_VERSION=1.29.3
export CONTROL_PLANE_MACHINE_COUNT=1
export WORKER_MACHINE_COUNT=1
export GCP_REGION=us-east4
export GCP_CONTROL_PLANE_MACHINE_TYPE=n1-standard-2
export GCP_NODE_MACHINE_TYPE=n1-standard-2
export CNI_RESOURCES=./cni-resource

You can take a look at CAPG’s CNI requirements here

You can use clusterctl to create a cluster definition.

clusterctl generate cluster capi-gcp-quickstart-topology --flavor topology -i gcp > capi-gcp-quickstart-topology.yaml

And by simply applying the resulting template, the cluster will be provisioned based on the existing ClusterClass.

kubectl apply -f capi-gcp-quickstart-topology.yaml

You can now experiment with creating more clusters based on this class while applying different configurations to each workload cluster.

Enabling ClusterClass Support

Enabling ClusterClass support is done via the ClusterTopology feature flag by setting it to true. This can be done before running clusterctl init by using the CLUSTER_TOPOLOGY environment variable:

export CLUSTER_TOPOLOGY=true
clusterctl init --infrastructure gcp

Disabling ClusterClass Support

Support for ClusterClass is disabled by default when you use the GCP infrastructure provider.

This section contains information about relevant CAPG features and how to use them.

Running Conformance tests

Required environment variables

Set the GCP region

export GCP_REGION=us-east4

Set the GCP project to use

export GCP_PROJECT=your-project-id

Set the path to the service account

export GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account.json

Optional environment variables

Set a specific name for your cluster

export CLUSTER_NAME=test1

Set a specific name for your network

export NETWORK_NAME=test1-mynetwork

Skip cleaning up the project resources

export SKIP_CLEANUP=1

Running the conformance tests

scripts/ci-conformance.sh

Machine Locations

This document describes how to configure the location of a CAPG cluster’s compute resources. By default, CAPG requires the user to specify a GCP region for the cluster’s machines by setting the GCP_REGION environment variable as outlined in the CAPI quickstart guide. The provider then picks a zone to deploy the control plane and worker nodes in and generates the according portions of the cluster’s YAML manifests.

It is possible to override this default behaviour and exercise more fine-grained control over machine locations as outlined in the rest of this document.

Control Plane Machine Location

Before deploying the cluster, add a failureDomains field to the spec of your GCPCluster definition, containing a list of allowed zones:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: GCPCluster
metadata:
  name: capi-quickstart
spec:
  network:
    name: default
  project: cyberscan2
  region: europe-west3
+  failureDomains:
+    - europe-west3-b

In this example configuration, only a single zone has been added, ensuring the control plane is provisioned in europe-west3-b.

Node Pool Location

Similar to the above, you can override the auto-generated GCP zone for your MachineDeployment, by changing the value of the failureDomain field at spec.template.spec.failureDomain:

apiVersion: cluster.x-k8s.io/v1alpha4
kind: MachineDeployment
metadata:
  name: capi-quickstart-md-0
spec:
  clusterName: capi-quickstart
  # [...]
  template:
    spec:
      # [...]
      clusterName: capi-quickstart
-      failureDomain: europe-west3-a
+      failureDomain: europe-west3-b

When combined like this, the above configuration effectively instructs CAPG to deploy the CAPI equivalent of a zonal GKE cluster.

Preemptible Virtual Machines

GCP Preemptible Virtual Machines allows user to run a VM instance at a much lower price when compared to normal VM instances.

Compute Engine might stop (preempt) these instances if it requires access to those resources for other tasks. Preemptible instances will always stop after 24 hours.

When do I use Preemptible Virtual Machines?

A Preemptible VM works best for applications or systems that distribute processes across multiple instances in a cluster. While a shutdown would be disruptive for common enterprise applications, such as databases, it’s hardly noticeable in distributed systems that run across clusters of machines and are designed to tolerate failures.

How do I use Preemptible Virtual Machines?

To enable a machine to be backed by Preemptible Virtual Machine, add preemptible option to GCPMachineTemplate and set it to True.

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: GCPMachineTemplate
metadata:
  name: capg-md-0
spec:
  region: us-west-1
  template:
    osDisk:
      diskSizeGB: 30
      managedDisk:
        storageAccountType: STANDARD
      osType: Linux
    vmSize: E2
    preemptible: true

Spot VMs

Spot VMs are the latest version of preemptible VMs.

To use a Spot VM instead of a Preemptible VM, add provisioningModel to GCPMachineTemplate and set it to Spot.

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: GCPMachineTemplate
metadata:
  name: capg-md-0
spec:
  region: us-west-1
  template:
    osDisk:
      diskSizeGB: 30
      managedDisk:
        storageAccountType: STANDARD
      osType: Linux
    vmSize: E2
    provisioningModel: Spot

NOTE: specifying preemptible: true and provisioningModel: Spot is equivalent to only provisioningModel: Spot. Spot takes priority.

Everything you need to know about contributing to CAPG.

If you are new to the project and want to help but don’t know where to start, you can refer to the Cluster API contributing guide.

Developing Cluster API Provider GCP

Setting up

Base requirements

Install go
- Get the latest patch version for go v1.18.
Install jq
- brew install jq on macOS.
- sudo apt install jq on Windows + WSL2.
- sudo apt install jq on Ubuntu Linux.
Install gettext package
- brew install gettext && brew link --force gettext on macOS.
- sudo apt install gettext on Windows + WSL2.
- sudo apt install gettext on Ubuntu Linux.
Install KIND
- GO111MODULE="on" go get sigs.k8s.io/kind@v0.14.0.
Install Kustomize
- brew install kustomize on macOS.
- install instructions on Windows + WSL2, Linux and macOS.
Install Python 3.x, if neither is already installed.
Install make.
- brew install make on MacOS.
- sudo apt install make on Windows + WSL2.
- sudo apt install make on Linux.
Install timeout
- brew install coreutils on macOS.

When developing on Windows, it is suggested to set up the project on Windows + WSL2 and the file should be checked out on as wsl file system for better results.

Get the source

git clone https://github.com/kubernetes-sigs/cluster-api-provider-gcp
cd cluster-api-provider-gcp

Get familiar with basic concepts

This provider is modeled after the upstream Cluster API project. To get familiar with Cluster API resources, concepts and conventions (such as CAPI and CAPG), refer to the Cluster API Book.

Dev manifest files

Part of running cluster-api-provider-gcp is generating manifests to run. Generating dev manifests allows you to test dev images instead of the default releases.

Dev images

Container registry

Any public container registry can be leveraged for storing cluster-api-provider-gcp container images.

CAPG Node images

In order to deploy a workload cluster you will need to build the node images to use, for that you can reference the image-builder project, also you can read the image-builder book

Please refer to the image-builder documentation in order to get the latest requirements to build the node images.

To build the node images for GCP: https://image-builder.sigs.k8s.io/capi/providers/gcp.html

Developing

Change some code!

Modules and Dependencies

This repository uses Go Modules to track vendor dependencies.

To pin a new dependency:

Run go get <repository>@<version>
(Optional) Add a replace statement in go.mod

Makefile targets and scripts are offered to work with go modules:

make verify-modules checks whether go modules are out of date.
make modules runs go mod tidy to ensure proper vendoring.
hack/ensure-go.sh checks that the Go version and environment variables are properly set.

Setting up the environment

Your environment must have the GCP credentials, check Authentication Getting Started

Tilt Requirements

Install Tilt:

brew install tilt-dev/tap/tilt on macOS or Linux
scoop bucket add tilt-dev https://github.com/tilt-dev/scoop-bucket & scoop install tilt on Windows

After the installation is done, verify that you have installed it correctly with: tilt version

Install Helm:

brew install helm on MacOS
choco install kubernetes-helm on Windows
Install instructions for Linux

As the project lacks a lot of feature for windows, it would be suggested to follow the above steps on Windows + WSL2 rather than Windows.

Using Tilt

Both of the Tilt setups below will get you started developing CAPG in a local kind cluster. The main difference is the number of components you will build from source and the scope of the changes you’d like to make. If you only want to make changes in CAPG, then follow CAPG instructions. This will save you from having to build all of the images for CAPI, which can take a while. If the scope of your development will span both CAPG and CAPI, then follow the CAPI and CAPG instructions.

Tilt for dev in CAPG

If you want to develop in CAPG and get a local development cluster working quickly, this is the path for you.

From the root of the CAPG repository, run the following to generate a tilt-settings.json file with your GCP service account credentials:

$ cat <<EOF > tilt-settings.json
{
  "kustomize_substitutions": {
      "GCP_B64ENCODED_CREDENTIALS": "$(cat PATH_FOR_GCP_CREDENTIALS_JSON | base64 -w0)"
  }
}
EOF

Set the following environment variables with the appropriate values for your environment:

$ export GCP_REGION="<GCP_REGION>" \
$ export GCP_PROJECT="<GCP_PROJECT>" \
$ export CONTROL_PLANE_MACHINE_COUNT=1 \
$ export WORKER_MACHINE_COUNT=1 \
# Make sure to use same kubernetes version here as building the GCE image
$ export KUBERNETES_VERSION=1.23.3 \
$ export GCP_CONTROL_PLANE_MACHINE_TYPE=n1-standard-2 \
$ export GCP_NODE_MACHINE_TYPE=n1-standard-2 \
$ export GCP_NETWORK_NAME=<GCP_NETWORK_NAME or default> \
$ export CLUSTER_NAME="<CLUSTER_NAME>" \

To build a kind cluster and start Tilt, just run:

make tilt-up

Alternatively, you can also run:

./scripts/setup-dev-enviroment.sh

It will setup the network, if you already setup the network you can skip this step for that just run:

./scripts/setup-dev-enviroment.sh --skip-init-network

By default, the Cluster API components deployed by Tilt have experimental features turned off. If you would like to enable these features, add extra_args as specified in The Cluster API Book.

Once your kind management cluster is up and running, you can deploy a workload cluster.

To tear down the kind cluster built by the command above, just run:

make kind-reset

And if you need to cleanup the network setup you can run:

./scripts/setup-dev-enviroment.sh --clean-network

Tilt for dev in both CAPG and CAPI

If you want to develop in both CAPI and CAPG at the same time, then this is the path for you.

To use Tilt for a simplified development workflow, follow the instructions in the cluster-api repo. The instructions will walk you through cloning the Cluster API (CAPI) repository and configuring Tilt to use kind to deploy the cluster api management components.

you may wish to checkout out the correct version of CAPI to match the version used in CAPG

Note that tilt up will be run from the cluster-api repository directory and the tilt-settings.json file will point back to the cluster-api-provider-gcp repository directory. Any changes you make to the source code in cluster-api or cluster-api-provider-gcp repositories will automatically redeployed to the kind cluster.

After you have cloned both repositories, your folder structure should look like:

|-- src/cluster-api-provider-gcp
|-- src/cluster-api (run `tilt up` here)

After configuring the environment variables, run the following to generate your tilt-settings.json file:

cat <<EOF > tilt-settings.json
{
  "default_registry": "${REGISTRY}",
  "provider_repos": ["../cluster-api-provider-gcp"],
  "enable_providers": ["gcp", "docker", "kubeadm-bootstrap", "kubeadm-control-plane"],
  "kustomize_substitutions": {
      "GCP_B64ENCODED_CREDENTIALS": "$(cat PATH_FOR_GCP_CREDENTIALS_JSON | base64 -w0)"
  }
}
EOF

$REGISTRY should be in the format docker.io/<dockerhub-username>

The cluster-api management components that are deployed are configured at the /config folder of each repository respectively. Making changes to those files will trigger a redeploy of the management cluster components.

Debugging

If you would like to debug CAPG you can run the provider with delve, a Go debugger tool. This will then allow you to attach to delve and troubleshoot the processes.

To do this you need to use the debug configuration in tilt-settings.json. Full details of the options can be seen here.

An example tilt-settings.json:

{
  "default_registry": "gcr.io/your-project-name-her",
  "provider_repos": ["../cluster-api-provider-gcp"],
  "enable_providers": ["gcp", "kubeadm-bootstrap", "kubeadm-control-plane"],
  "debug": {
    "gcp": {
      "continue": true,
      "port": 30000,
      "profiler_port": 40000,
      "metrics_port": 40001
    }
  },
  "kustomize_substitutions": {
      "GCP_B64ENCODED_CREDENTIALS": "$(cat PATH_FOR_GCP_CREDENTIALS_JSON | base64 -w0)"
  }
}

Once you have run tilt (see section below) you will be able to connect to the running instance of delve.

For vscode, you can use the a launch configuration like this:

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Core CAPI Controller GCP",
      "type": "go",
      "request": "attach",
      "mode": "remote",
      "remotePath": "",
      "port": 30000,
      "host": "127.0.0.1",
      "showLog": true,
      "trace": "log",
      "logOutput": "rpc"
    }
  ]
}

Create a new configuration and add it to the “Debug” menu to configure debugging in GoLand/IntelliJ following these instructions.

Alternatively, you may use delve straight from the CLI by executing a command like this:

delve -a tcp://localhost:30000

Deploying a workload cluster

After your kind management cluster is up and running with Tilt, ensure you have all the environment variables set as described in Tilt for dev in CAPG, and deploy a workload cluster with the following:

make create-workload-cluster

To delete the cluster:

make delete-workload-cluster

Submitting PRs and testing

Pull requests and issues are highly encouraged! If you’re interested in submitting PRs to the project, please be sure to run some initial checks prior to submission:

Do make sure to set the GOOGLE_APPLICATION_CREDENTIALS environment variable with the path to your JSON file. Check out the this doc to generate the credential.

make lint # Runs a suite of quick scripts to check code structure
make test # Runs tests on the Go code

Executing unit tests

make test executes the project’s unit tests. These tests do not stand up a Kubernetes cluster, nor do they have external dependencies.

Nightly Builds

Nightly builds are regular automated builds of the CAPG source code that occur every night.

These builds are generated directly from the latest commit of source code on the main branch.

Nightly builds serve several purposes:

Early Testing: They provide an opportunity for developers and testers to access the most recent changes in the codebase and identify any issues or bugs that may have been introduced.
Feedback Loop: They facilitate a rapid feedback loop, enabling developers to receive feedback on their changes quickly, allowing them to iterate and improve the code more efficiently.
Preview of New Features: Users and can get a preview of upcoming features or changes by testing nightly builds, although these builds may not always be stable enough for production use.

Overall, nightly builds play a crucial role in software development by promoting user testing, early bug detection, and rapid iteration.

CAPG Nightly build jobs run in Prow.

Usage

To try a nightly build, you can download the latest built nightly CAPG manifests, you can find the available ones by executing the following command:

curl -sL -H 'Accept: application/json' "https://storage.googleapis.com/storage/v1/b/k8s-staging-cluster-api-gcp/o" | jq -r '.items | map(select(.name | startswith("components/nightly_main"))) | .[] | [.timeCreated,.mediaLink] | @tsv'

The output should look something like this:

2024-05-03T08:03:09.087Z        https://storage.googleapis.com/download/storage/v1/b/k8s-staging-cluster-api-gcp/o/components%2Fnightly_main_2024050x?generation=1714723389033961&alt=media
2024-05-04T08:02:52.517Z        https://storage.googleapis.com/download/storage/v1/b/k8s-staging-cluster-api-gcp/o/components%2Fnightly_main_2024050y?generation=1714809772486582&alt=media
2024-05-05T08:02:45.840Z        https://storage.googleapis.com/download/storage/v1/b/k8s-staging-cluster-api-gcp/o/components%2Fnightly_main_2024050z?generation=1714896165803510&alt=media

Now visit the link for the manifest you want to download. This will automatically download the manifest for you.

Once downloaded you can apply the manifest directly to your testing CAPI management cluster/namespace (e.g. with kubectl), as the downloaded CAPG manifest will already contain the correct, corresponding CAPG nightly image reference.

Creating cluster without clusterctl

This document describes how to create a management cluster and workload cluster without using clusterctl. For creating a cluster with clusterctl, checkout our Cluster API Quick Start

For creating a Management cluster

Build required images by using the following commands:
- docker build --tag=gcr.io/k8s-staging-cluster-api-gcp/cluster-api-gcp-controller:e2e .
- make docker-build-all

Set the required environment variables. For example:

export GCP_REGION=us-east4
export GCP_PROJECT=k8s-staging-cluster-api-gcp
export CONTROL_PLANE_MACHINE_COUNT=1
export WORKER_MACHINE_COUNT=1
export KUBERNETES_VERSION=1.21.6
export GCP_CONTROL_PLANE_MACHINE_TYPE=n1-standard-2
export GCP_NODE_MACHINE_TYPE=n1-standard-2
export GCP_NETWORK_NAME=default
export GCP_B64ENCODED_CREDENTIALS=$( cat /path/to/gcp_credentials.json | base64 | tr -d '\n' )
export CLUSTER_NAME="capg-test"
export IMAGE_ID=projects/k8s-staging-cluster-api-gcp/global/images/cluster-api-ubuntu-2204-v1-27-3-nightly

You can check for other images to set the IMAGE_ID of your choice.

Run make create-management-cluster from root directory.

Jobs

This document provides an overview of our jobs running via Prow and Github actions.

Builds and tests running on the default branch

Legend

🟢 REQUIRED - Jobs that have to run successfully to get the PR merged.

Presubmits

Prow Presubmits:

🟢pull-cluster-api-provider-gcp-test ./scripts/ci-test.sh
🟢pull-cluster-api-provider-gcp-build ../scripts/ci-build.sh
🟢pull-cluster-api-provider-gcp-make runner.sh ./scripts/ci-make.sh
🟢pull-cluster-api-provider-gcp-e2e-test "BOSKOS_HOST"="boskos.test-pods.svc.cluster.local" ./scripts/ci-e2e.sh
pull-cluster-api-provider-gcp-conformance-ci-artifacts "BOSKOS_HOST"="boskos.test-pods.svc.cluster.local" ./scripts/ci-conformance.sh --use-ci-artifacts
pull-cluster-api-provider-gcp-conformance "BOSKOS_HOST"="boskos.test-pods.svc.cluster.local" ./scripts/ci-conformance.sh
pull-cluster-api-provider-gcp-capi-e2e "BOSKOS_HOST"="boskos.test-pods.svc.cluster.local" GINKGO_FOCUS="Cluster API E2E tests" ./scripts/ci-e2e.sh
pull-cluster-api-provider-gcp-test-release-0-4 ./scripts/ci-test.sh
pull-cluster-api-provider-gcp-build-release-0-4 ./scripts/ci-build.sh
pull-cluster-api-provider-gcp-make-release-0-4 runner.sh ./scripts/ci-make.sh
pull-cluster-api-provider-gcp-e2e-test-release-0-4 "BOSKOS_HOST"="boskos.test-pods.svc.cluster.local" ./scripts/ci-e2e.sh
pull-cluster-api-provider-gcp-make-conformance-release-0-4 "BOSKOS_HOST"="boskos.test-pods.svc.cluster.local" ./scripts/ci-conformance.sh --use-ci-artifacts

Github Presubmits Workflows:
Markdown-link-check find . -name \*.md | xargs -I{} markdown-link-check -c .markdownlinkcheck.json {}
🟢Lint-check make lint

Postsubmits

Github Postsubmit Workflows:

Code-coverage-check make test-cover

Periodics

Prow Periodics:

periodic-cluster-api-provider-gcp-build runner.sh ./scripts/ci-build.sh
periodic-cluster-api-provider-gcp-test runner.sh ./scripts/ci-test.sh
periodic-cluster-api-provider-gcp-make-conformance-v1alpha4 "BOSKOS_HOST"="boskos.test-pods.svc.cluster.local" ./scripts/ci-conformance.sh
periodic-cluster-api-provider-gcp-make-conformance-v1alpha4-k8s-ci-artifacts "BOSKOS_HOST"="boskos.test-pods.svc.cluster.local" ./scripts/ci-conformance.sh --use-ci-artifacts
periodic-cluster-api-provider-gcp-conformance-v1alpha4 "BOSKOS_HOST"="boskos.test-pods.svc.cluster.local" ./scripts/ci-conformance.sh
periodic-cluster-api-provider-gcp-conformance-v1alpha4-k8s-ci-artifacts "BOSKOS_HOST"="boskos.test-pods.svc.cluster.local" ./scripts/ci-conformance.sh --use-ci-artifacts

Adding new E2E test

E2E tests verify a complete, real-world workflow ensuring that all parts of the system work together as expected. If you are introducing a new feature that interconnects with other parts of the software, you will likely be required to add a verification step for this functionality with a new E2E scenario (unless it is already covered by existing test suites).

Create a cluster template

The test suite will provision a cluster based on a pre-defined yaml template (stored in ./test/e2e/data) which is then sourced in ./test/e2e/config/gcp-ci.yaml. New cluster definitions for E2E tests have to be added and sourced before being available to use in the E2E workflow.

Add test case

When the template is available, you can reference it as a flavor in Go. For example, adding a new test for self-managed cluster provisioning would look like the following:

Context("Creating a control-plane cluster with an internal load balancer", func() {
    It("Should create a cluster with 1 control-plane and 1 worker node with an internal load balancer", func() {
        By("Creating a cluster with internal load balancer")
        clusterctl.ApplyClusterTemplateAndWait(ctx, clusterctl.ApplyClusterTemplateAndWaitInput{
            ClusterProxy: bootstrapClusterProxy,
            ConfigCluster: clusterctl.ConfigClusterInput{
                LogFolder:                clusterctlLogFolder,
                ClusterctlConfigPath:     clusterctlConfigPath,
                KubeconfigPath:           bootstrapClusterProxy.GetKubeconfigPath(),
                InfrastructureProvider:   clusterctl.DefaultInfrastructureProvider,
                Flavor:                   "ci-with-internal-lb",
                Namespace:                namespace.Name,
                ClusterName:              clusterName,
                KubernetesVersion:        e2eConfig.MustGetVariable(KubernetesVersion),
                ControlPlaneMachineCount: ptr.To[int64](1),
                WorkerMachineCount:       ptr.To[int64](1),
            },
            WaitForClusterIntervals:      e2eConfig.GetIntervals(specName, "wait-cluster"),
            WaitForControlPlaneIntervals: e2eConfig.GetIntervals(specName, "wait-control-plane"),
            WaitForMachineDeployments:    e2eConfig.GetIntervals(specName, "wait-worker-nodes"),
        }, result)
    })
})

In this case, the flavor ci-with-internal-lb is a reference to the template cluster-template-ci-with-internal-lb.yaml which is available in ./test/e2e/data/infrastructure-gcp/cluster-template-ci-with-internal-lb.yaml.

Release Process

Change milestone

Create a new GitHub milestone for the next release
Change milestone applier so new changes can be applied to the appropriate release
- Open a PR in https://github.com/kubernetes/test-infra to change this line
  - Example PR: https://github.com/kubernetes/test-infra/pull/16827

Prepare branch, tag and release notes

Update the file metadata.yaml if is a major or minor release
Submit a PR for the metadata.yaml update if needed, wait for it to be merged before continuing, and pull any changes prior to continuing.
Create tag with git
- export RELEASE_TAG=v0.4.6 (the tag of the release to be cut)
- git tag -s ${RELEASE_TAG} -m "${RELEASE_TAG}"
  - -s creates a signed tag, you must have a GPG key added to your GitHub account
- git push upstream ${RELEASE_TAG}
make release from repo, this will create the release artifacts in the out/ folder
Install the release-notes tool according to instructions
Export GITHUB_TOKEN
Run the release-notes tool with the appropriate commits. Commits range from the first commit after the previous release to the new release commit.

release-notes --org kubernetes-sigs --repo cluster-api-provider-gcp \
--start-sha 1cf1ec4a1effd9340fe7370ab45b173a4979dc8f  \
--end-sha e843409f896981185ca31d6b4a4c939f27d975de
--branch <RELEASE_BRANCH_OR_MAIN_BRANCH>

Manually format and categorize the release notes

Promote image to prod repo

Promote image

Images are built by the post push images job
Create a PR in https://github.com/kubernetes/k8s.io to add the image and tag
- Example PR: https://github.com/kubernetes/k8s.io/pull/1462
Location of image: https://console.cloud.google.com/gcr/images/k8s-staging-cluster-api-gcp/GLOBAL/cluster-api-gcp-controller?rImageListsize=30

To promote the image you should use a tool called cip-mm, please refer: https://github.com/kubernetes-sigs/promo-tools/tree/main/cmd/cip-mm

For example, we want to promote v0.3.1 release, we can run the following command:

$ cip-mm --base_dir=$GOPATH/src/k8s.io/k8s.io/k8s.gcr.io --staging_repo=gcr.io/k8s-staging-cluster-api-gcp  --filter_tag=v0.3.1

Release in GitHub

Create the GitHub release in the UI

Create a draft release in GitHub and associate it with the tag that was created
Copy paste the release notes
Upload artifacts from the out/ folder
Publish release
Announce the release

Versioning

cluster-api-provider-gcp follows the semantic versioning specification.

Example versions:

Pre-release: v0.1.1-alpha.1
Minor release: v0.1.0
Patch release: v0.1.1
Major release: v1.0.0

Expected artifacts

A release yaml file infrastructure-components.yaml containing the resources needed to deploy to Kubernetes
A cluster-templates.yaml for each supported flavor
A metadata.yaml which maps release series to cluster-api contract version
Release notes

Communication

Patch Releases

Announce the release in Kubernetes Slack on the #cluster-api-gcp channel.

Minor/Major Releases

Follow the communications process for pre-releases
An announcement email is sent to kubernetes-sig-cluster-lifecycle@googlegroups.com with the subject [ANNOUNCE] cluster-api-provider-gcp <version> has been released

Cluster API GCP roadmap

This roadmap is a constant work in progress, subject to frequent revision. Dates are approximations. Features are listed in no particular order.

v0.4 (v1Alpha4)

Description	Issue/Proposal/PR

v1beta1/v1

Proposal awaits.

Lifecycle frozen

Items within this category have been identified as potential candidates for the project and can be moved up into a milestone if there is enough interest.

Description	Issue/Proposal/PR
Enabling GPU enabled clusters	#289
Publish images in GCP	#152
Proper bootstrap of manually deleted worker VMs	#173
Correct URI for subnetwork setup	#278
Workload identity support	#311
Implement GCPMachinePool using MIGs	#297