0

Is it possible to resize a node pool disk size on GKE with Terraform without recreating (first delete, then create again with new settings) the cluster?

I want to automate the migration of the node pool with the workloads without recreating the cluster and without any downtime.

This is the output i got when i upscaled the cluster, it is kiling and recreating the whole node pool and i dont want that.

3 Answers3

1

You cant decrease the disk size but you can create a new disk with a smaller size and add them to the new nodes.

Reference:

You can only increase, and not decrease, the size of a disk. To decrease the disk size, you must create a new disk with a smaller size. Until you delete the original, larger disk, you are charged for both disks.

https://cloud.google.com/compute/docs/disks/resize-persistent-disk

Edit: If you want increase the disk, you can create a new node pool with the new config and later you must cordon & drain (kubectl cordon & drain) the all nodes of old node pool and with this you can delete it without downtime.

Enrique Tejeda
  • 356
  • 1
  • 6
  • This is the output i got when i upscaled the cluster, it is kiling and recreating the whole node pool and i dont want that. https://postimg.cc/1g52LsD1 – clustermania Jan 10 '23 at 13:55
  • Edit: If you want increase the disk, you can create a new node pool with the new config and later you must cordon & drain (kubectl cordon & drain) the all nodes of old node pool and with this you can delete it without downtime. – Enrique Tejeda Jan 10 '23 at 14:11
0

As Enrique said you can increase the size but can't decrease the size.

resource "google_compute_disk" "test-np5-data1" {
 project = <project_id>
 name = "disk"
 type = "pd-standard"
 zone = "us-central1-a"
 size = 30
}

So if you are looking to increase the disk you can definitely do it with terraform also.

After increasing the size of the disk you might need to grow the filesystem for nodes

Ref : https://cloud.google.com/compute/docs/disks/resize-persistent-disk#resize_partitions

So you can use the exec or remote-exec in terraform and SSH to GKE nodes and grow the Disk partitions.

Update :

Based on your update you are updating the disk size and machine type also so i would recommend using the lifecycle to create first and before deleting.

lifecycle {
    create_before_destroy = true
  }

Dont forget to cordon and drain the node while migrating if running multiple replicas and distributed properly it would be fine.

Harsh Manvar
  • 27,020
  • 6
  • 48
  • 102
  • This is the output i got when i upscaled the cluster, it is kiling and recreating the whole node pool and i dont want that. https://postimg.cc/1g52LsD1 – clustermania Jan 10 '23 at 13:55
  • you can use the lifecycle { create_before_destroy = true } updated answer with details check. – Harsh Manvar Jan 10 '23 at 14:09
  • It does not work for me with the lifecycle create before delete because i have to change the Control plane address range otherwise there is conflict between the two clusters. My guess is this could be done in automated way (to generate the ip or smth like that) but I don't think that this is a scalable and safe solution. Also at the end of the process as far as I understand I have to manually migrate all the workloads (micro-services) from the old cluster (or node pool) to the new one. – clustermania Jan 10 '23 at 14:49
  • it's not like that way to migrate workload you can cordon and drain node with terraform also once new nodes are up & running that's automated way. – Harsh Manvar Jan 10 '23 at 16:08
0

So to safely change the specs of a node pool one has to create second node pool and move all workloads there after which to delete the old one. You can do this the way @Enrique Tejada said. Refer to: https://cloud.google.com/kubernetes-engine/docs/tutorials/migrating-node-pool#step_4_migrate_the_workloads It is about machine_type but it seems that it is the same for disk_size. Currently I am struggling with this using terraform - I have the task to enable superiors to be able to change disk_size and machine_type by changing the values in the source code, without recreating the cluster. It seems that it is not possible.

Here is a similar question Reduce boot disk size of a GKE cluster

YYx00xZZ
  • 28
  • 6