so I'm trying to create 5 gke clusters in 5 different zones to simulate a fog-edge environment and I keep running into the same issue. Terraform fails to create one or more of the clusters. I've ran it multiple times and it is always the same cluster(s). When I change the zone of one of the clusters, then I have a different one consistently failing. I reached a point where only one was failing and tried removing that one and create only four, but then another was failing. When I create only one it works just fine. It seems like there's some compatibility with the zones or some quota I'm maybe hitting, but other times that this was the case the error explicitly said so and I fixed it, but this time the error is completely useless every time. Anyone have any clue why this is happening?
Below is the relevant code:
resource "google_project_service" "compute" {
service = "compute.googleapis.com"
disable_dependent_services = true
}
resource "google_project_service" "container" {
service = "container.googleapis.com"
disable_dependent_services = true
}
# K8s Clusters
resource "google_container_cluster" "primary" {
count = length(var.cluster-alias)
name = var.cluster-alias[count.index] # var.name
location = var.cluster-zone[count.index] # var.zone
remove_default_node_pool = true
initial_node_count = 1
# logging_service = "none"
# monitoring_service = "none"
networking_mode = "VPC_NATIVE"
addons_config {
http_load_balancing {
disabled = false # has to be enabled for multi cluster ingress
}
horizontal_pod_autoscaling {
disabled = true
}
}
vertical_pod_autoscaling {
enabled = false
}
release_channel {
channel = "REGULAR"
}
workload_identity_config {
workload_pool = "${var.gcp_project_id}.svc.id.goog"
}
ip_allocation_policy {
cluster_ipv4_cidr_block = "" # var.pod-ips[count.index]
services_ipv4_cidr_block = "" # var.service-ips[count.index]
}
depends_on = [
google_project_service.compute,
google_project_service.container
]
}
The error message:
│ Error: Error waiting for creating GKE cluster: Failed to create cluster
│
│ with google_container_cluster.primary[1],
│ on 3-main.tf line 17, in resource "google_container_cluster" "primary":
│ 17: resource "google_container_cluster" "primary" {
And the relevant variables:
variable "cluster-alias" {
type = list(string)
description = "Aliases to loop and create multiple gke clusters for the multi cluster"
default = ["edge-cluster-1", "edge-cluster-2", "fog-cluster-1", "fog-cluster-2", "cloud-cluster"]
}
variable "cluster-zone" {
type = list(string)
description = "Zones for each cluster"
default = ["europe-west1-b", "europe-southwest1-a", "us-central1-a", "us-east1-b", "southamerica-east1-a"]
}
variable "pod-ips" {
type = list(string)
description = "The ip ranges for the pods in the clusters"
default = [ "10.8.0.0/14", "10.16.0.0/14", "10.24.0.0/14", "10.32.0.0/14", "10.48.0.0/14"]
}
variable "service-ips" {
type = list(string)
description = "The ip ranges for the services in the clusters"
default = [ "10.12.0.0/20", "10.20.0.0/20", "10.28.0.0/20", "10.36.0.0/20", "10.52.0.0/20"]
}
I have also played around with ips, because they were causing some issues, but I saw I could just let the resource choose the ips itself, so I let it do that. Maybe they are relevant to this error?