1

I was doing some test in my GCP project to verify if i can migrate Dataproc on GKE and keep it up and running, while leveraging on auto scaling for workloads. However, i'm blocked since teh beginning.

Picking the example from the doc, placed together and i get this error message

╷
│ Error: Unsupported block type
│
│   on ../../modules/poc/main.tf line 46, in resource "google_dataproc_cluster" "dataproc_gke_cluster":
│   46:   virtual_cluster_config {
│
│ Blocks of type "virtual_cluster_config" are not expected here.
╵
ERRO[0003] Terraform invocation failed in 
ERRO[0003] 1 error occurred:
        * exit status 1

From the doc, virtual_cluster_config is an expected block inside google_dataproc_cluster resource

https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/dataproc_cluster#nested_virtual_cluster_config

https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster

Here my full code:


resource "google_container_cluster" "poc_primary_gke" {
  name     = "poc-gke-cluster"
  location = "europe-west1"

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count       = 1
}

resource "google_container_node_pool" "primary_preemptible_nodes" {
  name       = "my-node-pool"
  location   = "europe-west1"
  cluster    = google_container_cluster.poc_primary_gke.name
  node_count = 1

  node_config {
    preemptible  = true
    machine_type = "e2-medium"

    # Google recommends custom service accounts that have cloud-platform scope and permissions granted via IAM Roles.
    service_account = var.service_account_email
    oauth_scopes    = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]
  }
}

resource "google_storage_bucket" "staging_bucket" {
  name          = "staging-bucket-poc"
  location      = "EU"
  force_destroy = true

  uniform_bucket_level_access = true
}

resource "google_dataproc_cluster" "dataproc_gke_cluster" {
  name     = "gke-dataproc-poc"
  region   = "europe-west1"
  graceful_decommission_timeout = "120s"

  labels = var.labels

  virtual_cluster_config {
    staging_bucket = google_storage_bucket.staging_bucket.name
    kubernetes_cluster_config {
        kubernetes_namespace = "foobar"

        kubernetes_software_config {
          component_version = {
            "SPARK" : "3.1-dataproc-7"
          }

          properties = {
            "spark:spark.eventLog.enabled": "true"
          }
        }

        gke_cluster_config {
          gke_cluster_target = google_container_cluster.primary.id

          node_pool_target {
            node_pool = "dpgke"
            roles = ["DEFAULT"]

            node_pool_config {
              autoscaling {
                min_node_count = 1
                max_node_count = 6
              }

              config {
                machine_type      = "n1-standard-4"
                preemptible       = true
                local_ssd_count   = 1
                min_cpu_platform  = "Intel Sandy Bridge"
              }
            }
          }
        }
      }
  }
}

Does any one successfully created a Dataproc Cluster on GKE via Terraform?

Marco Massetti
  • 539
  • 4
  • 12
  • Does this [link](https://github.com/hashicorp/terraform-provider-google/blob/main/google/resource_dataproc_cluster.go) help you? Let me know if it's helpful or not? – Prajna Rai T Nov 28 '22 at 10:06
  • No, that only confirms that virtual_cluster_config is available for Dataproc resource but still not accepted in my case https://github.com/hashicorp/terraform-provider-google/blob/main/google/resource_dataproc_cluster.go#L205. I have checked for any typo but couldn't fix it yet – Marco Massetti Nov 28 '22 at 17:43

1 Answers1

0

I finally solved the issue.

Apparently, the problem is the terraform google provider version. I thought I had the latest but I wasn't right.

It seems not available for versions <4.39, so I currently upgraded to 4.44. Everything is good now.

Marco Massetti
  • 539
  • 4
  • 12
  • Are you using the node pool that you just created `my-node-pool` ? Where is `dpgke` defined? I assume it means dataproc gke – it243 Jun 25 '23 at 15:59
  • Not necessary, it was trick explained in Terraform or in another StackOverflow answer (I don't recall at the moment). The idea was to create an empty GKE (hence the small preemptible machines) and then use the mandatory pool from Dataproc to allow the service handle himself without bothering of maintaining other pools. That's at least was my plan time ago, use an empty GKE and let Dataproc do the work. Keep the idea but don't trust me 100% since I don't recall precisely – Marco Massetti Sep 01 '23 at 19:42