Using GKE with Terraform

-> Visit the Provision a GKE Cluster (Google Cloud) tutorial to learn how to provision and interact with a GKE cluster.

This page is a brief overview of GKE usage with Terraform, based on the content available in the How-to guides for GKE. It's intended as a supplement for intermediate users, covering cases that are unintuitive or confusing when using Terraform instead of gcloud/the Cloud Console.

Additionally, you may consider using Google's kubernetesEngine module, which implements many of these practices for you.

If the information on this page conflicts with recommendations available on cloudGoogleCom, cloudGoogleCom should be considered the correct source.

Interacting with Kubernetes

After creating a googleContainerCluster with Terraform, you can use gcloud to configure cluster access, generating a kubeconfig entry:

gcloud container clusters get-credentials cluster-name

Using this command, gcloud will generate a kubeconfig entry that uses gcloud as an authentication mechanism. However, sometimes performing authentication inline with Terraform or a static config without gcloud is more desirable.

Using the Kubernetes and Helm Providers

When using the kubernetes and helm providers, statically defined credentials can allow you to connect to clusters defined in the same config or in a remote state. You can configure either using configuration such as the following:

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as kubernetes from "./.gen/providers/kubernetes";
import * as google from "./.gen/providers/google";
/*The following providers are missing schema information and might need manual adjustments to synthesize correctly: kubernetes, google.
For a more precise conversion please use the --provider flag in convert.*/
const dataGoogleClientConfigProvider =
  new google.dataGoogleClientConfig.DataGoogleClientConfig(
    this,
    "provider",
    {}
  );
const dataGoogleContainerClusterMyCluster =
  new google.dataGoogleContainerCluster.DataGoogleContainerCluster(
    this,
    "my_cluster",
    {
      location: "us-central1",
      name: "my-cluster",
    }
  );
new kubernetes.provider.KubernetesProvider(this, "kubernetes", {
  cluster_ca_certificate: `\${base64decode(
    ${dataGoogleContainerClusterMyCluster.masterAuth.fqn}[0].cluster_ca_certificate,
  )}`,
  host: `https://\${${dataGoogleContainerClusterMyCluster.endpoint}}`,
  token: dataGoogleClientConfigProvider.accessToken,
});

Alternatively, you can authenticate as another service account on which your Terraform user has been granted the roles/iamServiceAccountTokenCreator role:

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as kubernetes from "./.gen/providers/kubernetes";
import * as google from "./.gen/providers/google";
/*The following providers are missing schema information and might need manual adjustments to synthesize correctly: kubernetes, google.
For a more precise conversion please use the --provider flag in convert.*/
const dataGoogleContainerClusterMyCluster =
  new google.dataGoogleContainerCluster.DataGoogleContainerCluster(
    this,
    "my_cluster",
    {
      location: "us-central1",
      name: "my-cluster",
    }
  );
const dataGoogleServiceAccountAccessTokenMyKubernetesSa =
  new google.dataGoogleServiceAccountAccessToken.DataGoogleServiceAccountAccessToken(
    this,
    "my_kubernetes_sa",
    {
      lifetime: "3600s",
      scopes: ["userinfo-email", "cloud-platform"],
      target_service_account: "{{service_account}}",
    }
  );
new kubernetes.provider.KubernetesProvider(this, "kubernetes", {
  cluster_ca_certificate: `\${base64decode(
    ${dataGoogleContainerClusterMyCluster.masterAuth.fqn}[0].cluster_ca_certificate,
  )}`,
  host: `https://\${${dataGoogleContainerClusterMyCluster.endpoint}}`,
  token: dataGoogleServiceAccountAccessTokenMyKubernetesSa.accessToken,
});

Using kubectl / kubeconfig

It's possible to interface with kubectl or other kubeconfig-based tools by providing them a kubeconfig directly. For situations where gcloud can't be used as an authentication mechanism, you can generate a static kubeconfig file instead.

An authentication submodule, auth, is provided as part of Google's kubernetesEngine module. You can use it through the module registry, or in the module source.

Authenticating using this method will use a Terraform-generated access token which persists for 1 hour. For longer-lasting sessions, or cases where a single persistent config is required, using gcloud is advised.

VPC-native Clusters

VPC-native clusters are GKE clusters that use alias IP ranges. VPC-native clusters route traffic between pods using a VPC network, and are able to route to other VPCs across network peerings along with several other benefits.

In both gcloud and the Cloud Console, VPC-native is the default for new clusters and many managed products such as CloudSQL, Memorystore and others require VPC Native Clusters to work properly. In Terraform however, the default behaviour is to create a routes-based cluster for backwards compatibility.

It's recommended that you create a VPC-native cluster, done by specifying the ipAllocationPolicy block or using secondary ranges on existing subnet. Configuration will look like the following:

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as google from "./.gen/providers/google";
/*The following providers are missing schema information and might need manual adjustments to synthesize correctly: google.
For a more precise conversion please use the --provider flag in convert.*/
const googleComputeNetworkCustom = new google.computeNetwork.ComputeNetwork(
  this,
  "custom",
  {
    auto_create_subnetworks: false,
    name: "test-network",
  }
);
const googleComputeSubnetworkCustom =
  new google.computeSubnetwork.ComputeSubnetwork(this, "custom_1", {
    ip_cidr_range: "10.2.0.0/16",
    name: "test-subnetwork",
    network: googleComputeNetworkCustom.id,
    region: "us-central1",
    secondary_ip_range: [
      {
        ip_cidr_range: "192.168.1.0/24",
        range_name: "services-range",
      },
      {
        ip_cidr_range: "192.168.64.0/22",
        range_name: "pod-ranges",
      },
    ],
  });
/*This allows the Terraform resource name to match the original name. You can remove the call if you don't need them to match.*/
googleComputeSubnetworkCustom.overrideLogicalId("custom");
new google.containerCluster.ContainerCluster(this, "my_vpc_native_cluster", {
  initial_node_count: 1,
  ip_allocation_policy: [
    {
      cluster_secondary_range_name: "pod-ranges",
      services_secondary_range_name: `\${${googleComputeSubnetworkCustom.secondaryIpRange}.0.range_name}`,
    },
  ],
  location: "us-central1",
  name: "my-vpc-native-cluster",
  network: googleComputeNetworkCustom.id,
  subnetwork: googleComputeSubnetworkCustom.id,
});

Node Pool Management

In Terraform, we recommend managing your node pools using the googleContainerNodePool resource, separate from the googleContainerCluster resource. This separates cluster-level configuration like networking and Kubernetes features from the configuration of your nodes. Additionally, it helps ensure your cluster isn't inadvertently deleted. Terraform struggles to handle complex changes to subresources, and may attempt to delete a cluster based on changes to inline node pools.

However, the GKE API doesn't allow creating a cluster without nodes. It's common for Terraform users to define a block such as the following:

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as google from "./.gen/providers/google";
/*The following providers are missing schema information and might need manual adjustments to synthesize correctly: google.
For a more precise conversion please use the --provider flag in convert.*/
new google.containerCluster.ContainerCluster(this, "my-gke-cluster", {
  initial_node_count: 1,
  location: "us-central1",
  name: "my-gke-cluster",
  remove_default_node_pool: true,
});

This creates initialNodeCount nodes per zone the cluster has nodes in, typically 1 zone if the cluster location is a zone, and 3 if it's a region. Your cluster's initial GKE masters will be sized based on the initialNodeCount provided. If subsequent node pools add a large number of nodes to your cluster, GKE may cause a resizing event immediately after adding a node pool.

The initial node pool will be created using the Compute Engine default service account as the serviceAccount. If you've disabled that service account, or want to use a least privilege Google service account for the temporary node pool, you can add the following configuration to your googleContainerCluster block:

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as google from "./.gen/providers/google";
/*The following providers are missing schema information and might need manual adjustments to synthesize correctly: google.
For a more precise conversion please use the --provider flag in convert.*/
const googleContainerClusterMyGkeCluster =
  new google.containerCluster.ContainerCluster(this, "my-gke-cluster", {
    node_config: [
      {
        service_account: "{{service_account}}",
      },
    ],
  });
googleContainerClusterMyGkeCluster.addOverride("lifecycle", [
  {
    ignore_changes: ["node_config"],
  },
]);

Windows Node Pools

You can add Windows Server node pools to your GKE cluster by adding googleContainerNodePool to your Terraform configuration with imageType=windowsLtsc or windowsSac.

/*Provider bindings are generated by running cdktf get.
See https://cdk.tf/provider-generation for more details.*/
import * as google from "./.gen/providers/google";
/*The following providers are missing schema information and might need manual adjustments to synthesize correctly: google.
For a more precise conversion please use the --provider flag in convert.*/
const googleContainerClusterDemoCluster =
  new google.containerCluster.ContainerCluster(this, "demo_cluster", {
    initial_node_count: 1,
    ip_allocation_policy: [
      {
        cluster_ipv4_cidr_block: "/14",
        services_ipv4_cidr_block: "/20",
      },
    ],
    location: "us-west1-a",
    min_master_version: "1.16",
    name: "demo-cluster",
    project: "",
    remove_default_node_pool: true,
  });
const googleContainerNodePoolLinuxPool =
  new google.containerNodePool.ContainerNodePool(this, "linux_pool", {
    cluster: googleContainerClusterDemoCluster.name,
    location: googleContainerClusterDemoCluster.location,
    name: "linux-pool",
    node_config: [
      {
        image_type: "COS_CONTAINERD",
      },
    ],
    project: googleContainerClusterDemoCluster.project,
  });
new google.containerNodePool.ContainerNodePool(this, "windows_pool", {
  cluster: googleContainerClusterDemoCluster.name,
  depends_on: [`\${${googleContainerNodePoolLinuxPool.fqn}}`],
  location: googleContainerClusterDemoCluster.location,
  name: "windows-pool",
  node_config: [
    {
      image_type: "WINDOWS_LTSC",
      machine_type: "e2-standard-4",
    },
  ],
  project: googleContainerClusterDemoCluster.project,
});

The example above creates a cluster with a small Linux node pool and a Windows Server node pool. The Linux node pool is necessary since some critical pods are not yet supported on Windows. Please see Limitations for details on features that are not supported by Windows Server node pools.