2

I am using GCP WM Instance with Container Optimized OS to run a container image. I use cloud-init to initialize storage. When I initialize multiple drives, for example several local ssd, then it takes some time and docker container starts before cloud-init is complete initializing drives. Which means I have race condition. Notice that docker container has started before file system has been initialized.

Apr 27 16:30:44 ch-s01r1 systemd[1]: Started docker-xxxxxx
...
2023-04-27 16:30:52,770 - subp.py[DEBUG]: Running command mkfs.ext4 -L ssd1 -m 0 ...

In Linux such problems are solved with adding dependencies between systemd services, but I can not find any documentation how to add systemd dependency either in cloud-init nor GCP Container Optimized OS.

In terraform I have:

resource "google_compute_instance" "clickhouse-server" {
...
  metadata = {
    gce-container-declaration = module.gce-container.metadata_value
    user-data                 = data.cloudinit_config.clickhouse_config[count.index].rendered
  }

As I understand, those 2 parts start racing: user-data will cause file system initialization and gce-container-declaration will trigger container start up.

How do I ensure that container is not being started before cloud-init is complete?

My cloud-init file (I use terraform to expand macros):

...
fs_setup:
  - label: log
    filesystem: ext4
    device: log
    partition: auto
    cmd: mkfs.ext4 -L %(label)s -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard %(device)s
  - label: data
    filesystem: ext4
    device: data
    partition: auto
    cmd: mkfs.ext4 -L %(label)s -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard %(device)s
  %{~ for n in range(ssd_count) ~}
  - label: ssd${n}
    filesystem: ext4
    device: ssd${n}
    partition: auto
    cmd: mkfs.ext4 -L %(label)s -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard %(device)s
  %{~ endfor ~}
Vadym Chekan
  • 4,977
  • 2
  • 31
  • 24

1 Answers1

0

Answering my own question. Solution is in giving up on terraform's gce-container-declaration generation and doing it manually in cloud-init-only. Hint comes from GCP documentation Container-Optimized OS/Running containers on instances. It calls for crating a file for systemd service which would start docker container and starting docker from cloud-init itself as in:

#cloud-config

write_files:
- path: /etc/systemd/system/cloudservice.service
  content: |
    [Service]
    ExecStart=/usr/bin/docker run --rm --name=mycloudservice gcr.io/google-containers/busybox:latest /bin/sleep 3600

With file system creation and docker being started by cloud-init itself, there is no more racing between scripts initialized from gce-container-declaration and user-data.

Final solution (with comments inside) looks like this (my-server.init.yml):

# Execute on every boot: mound file systems and start container
runcmd:
  - mount /dev/disk/by-id/google-log /mnt/disks/log
  - mount -o discard,defaults,nobarrier /dev/md0 /mnt/disks/ssd
  - systemctl daemon-reload
  - systemctl start my.service
write_files:
  # Execute only first time when instance is created: create RAID array of ssd drives and format file systems
  - path: /var/lib/cloud/scripts/per-instance/fs-prepare.sh
    permissions: 0544
    content: |
      #!/bin/bash
      
      mkfs.ext4 -L log -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/disk/by-id/google-log
      mkdir -p /mnt/disks/log
      
      mdadm --create /dev/md0 --level=0 --raid-devices=${ssd_count} %{ for n in range(ssd_count) } /dev/disk/by-id/google-local-nvme-ssd-${n} %{ endfor }
      mkfs.ext4 -L ssd -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/md0
      mkdir -p /mnt/disks/ssd
  # Systemd service descriptor which will start (and restart) container
  - path: /etc/systemd/system/my.service
    content: |
      [Unit]
      Description=Start my docker container
      
      [Service]
      ExecStart=/usr/bin/docker run --rm --name=my-server -p 9000:9000 -p 8123:8123 -p 9009:9009 ${mounts} ${my_server_image}  
      ExecStop=/usr/bin/docker stop my-server
      ExecStopPost=/usr/bin/docker rm my-server
      Restart=on-failure

In Terraform's main.tf the following changes are required (notice commented out gce-container-declaration):

locals {
  mounts = [
    "-v /var/lib/my-server/config.d:/etc/my-server/config.d:ro",
    "-v /var/lib/my-server/users.d:/etc/my-server/users.d:ro",
    "-v /mnt/disks/ssd:/var/lib/my-data",
    "-v /mnt/disks/log:/var/log/my-server"
  ]
}

resource "google_compute_instance" "my-server" {
...
  attached_disk {
    source      = google_compute_disk.my-log-disk[count.index].self_link
    device_name = "log"
  }

  dynamic "scratch_disk" {
    for_each = range(var.ssd_count)
    content {
      interface = "NVME"
    }
  }

  metadata = {
#    gce-container-declaration = module.gce-container.metadata_value
    user-data                 = data.cloudinit_config.my_config[count.index].rendered
  }

}

data "cloudinit_config" "my_config" {
...
  part {
    content_type = "text/cloud-config"
    content = templatefile("${path.module}/my-server.init.yml", {
      my_server_image = var.my_server_image
      ssd_count = var.ssd_count
      mounts = join(" ", local.mounts)
    })
    filename = "my-server.init.yml"
  }
}

# Just to retrieve Container Optimized OS image name,
# DO NOT use to render `google_compute_instance.metadata.gce-container-declaration`
# because it will cause a race race between container start and cloud-init file system 
# initialization
module "gce-container" {
  source  = "terraform-google-modules/container-vm/google"
  version = "3.1.0"
}

Vadym Chekan
  • 4,977
  • 2
  • 31
  • 24