-1

Need some help...

Can't access my VM I've tried to use gcloud, browser ssh and ssh...

I have edit meta data, and enable-oslogin is set to FALSE... I went to all questions here but couldn't find a solution for the problem I'm having.

At the beginning I thought it was a space problem so I have stoped the instance and resized my disk, even after that I can't ssh into it.

I have tried all suggestions from this post: ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]

but nothing worked...

Port 22 is open and firewall has the default-allow-ssh...

I can't lose my data from this instance so please, any help is more than wecolme!

EDIT: Instance is running Ubuntu 20.04

Logs

Jan  5 01:48:21 my-instance GCEGuestAgent[744]: 2022-01-05T01:48:21.1503Z GCEGuestAgent Info: Adding existing user user_name to google-sudoers group.
Jan  5 01:48:21 my-instance GCEGuestAgent[744]: 2022-01-05T01:48:21.1556Z GCEGuestAgent Error non_windows_accounts.go:152: gpasswd: /etc/group.956: No space left on device#012gpasswd: cannot lock /etc/group; try again later.#012.
Jan  5 01:48:21 my-instance GCEGuestAgent[744]: 2022-01-05T01:48:21.1557Z GCEGuestAgent Info: Updating keys for user user_name.
Jan  5 01:48:22 my-instance otelopscol[703]: 2022-01-05T01:48:22.415Z#011info#011exporterhelper/queued_retry.go:215#011Exporting failed. Will retry the request after interval.#011{"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup monitoring.googleapis.com on [::1]:53: read udp [::1]:55573->[::1]:53: read: connection refused\"", "interval": "21.556591578s"}
Jan  5 01:48:32 my-instance otelopscol[703]: 2022-01-05T01:48:32.623Z#011info#011exporterhelper/queued_retry.go:215#011Exporting failed. Will retry the request after interval.#011{"kind": "exporter", "name": "googlecloud", "error": "[rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup monitoring.googleapis.com on [::1]:53: read udp [::1]:39674->[::1]:53: read: connection refused\"; rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup monitoring.googleapis.com on [::1]:53: read udp [::1]:39674->[::1]:53: read: connection refused\"; rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup monitoring.googleapis.com on [::1]:53: read udp [::1]:39674->[::1]:53: read: connection refused\"]", "interval": "14.527205159s"}
Jan  5 01:48:43 my-instance otelopscol[703]: 2022-01-05T01:48:43.974Z#011info#011exporterhelper/queued_retry.go:215#011Exporting failed. Will retry the request after interval.#011{"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup monitoring.googleapis.com on [::1]:53: read udp [::1]:39674->[::1]:53: read: connection refused\"", "interval": "23.912476906s"}
Jan  5 01:48:46 my-instance GCEGuestAgent[744]: 2022-01-05T01:48:46.6336Z GCEGuestAgent Info: Adding existing user user_name to google-sudoers group.
Jan  5 01:48:46 my-instance GCEGuestAgent[744]: 2022-01-05T01:48:46.6352Z GCEGuestAgent Error non_windows_accounts.go:152: gpasswd: /etc/group.972: No space left on device#012gpasswd: cannot lock /etc/group; try again later.#012.
Jan  5 01:48:46 my-instance GCEGuestAgent[744]: 2022-01-05T01:48:46.6353Z GCEGuestAgent Info: Updating keys for user user_name.
Jan  5 01:48:46 my-instance dbus-daemon[561]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service' requested by ':1.1' (uid=0 pid=689 comm="/usr/bin/python3 /usr/share/unattended-upgrades/un" label="unconfined")
Jan  5 01:48:46 my-instance systemd[1]: Condition check resulted in Load Kernel Module drm being skipped.
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Failed to run 'start' task: No space left on device
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Failed with result 'resources'.
Jan  5 01:48:46 my-instance systemd[1]: Failed to start Login Service.
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 1.
Jan  5 01:48:46 my-instance systemd[1]: Stopped Login Service.
Jan  5 01:48:46 my-instance systemd[1]: Condition check resulted in Load Kernel Module drm being skipped.
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Failed to run 'start' task: No space left on device
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Failed with result 'resources'.
Jan  5 01:48:46 my-instance systemd[1]: Failed to start Login Service.
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 2.
Jan  5 01:48:46 my-instance systemd[1]: Stopped Login Service.
Jan  5 01:48:46 my-instance systemd[1]: Condition check resulted in Load Kernel Module drm being skipped.
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Failed to run 'start' task: No space left on device
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Failed with result 'resources'.
Jan  5 01:48:46 my-instance systemd[1]: Failed to start Login Service.
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 3.
Jan  5 01:48:46 my-instance systemd[1]: Stopped Login Service.
Jan  5 01:48:46 my-instance systemd[1]: Condition check resulted in Load Kernel Module drm being skipped.
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Failed to run 'start' task: No space left on device
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Failed with result 'resources'.
Jan  5 01:48:46 my-instance systemd[1]: Failed to start Login Service.
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 4.
Jan  5 01:48:46 my-instance systemd[1]: Stopped Login Service.
Jan  5 01:48:46 my-instance systemd[1]: Condition check resulted in Load Kernel Module drm being skipped.
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Failed to run 'start' task: No space left on device
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Failed with result 'resources'.
Jan  5 01:48:46 my-instance systemd[1]: Failed to start Login Service.
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 5.
Jan  5 01:48:46 my-instance systemd[1]: Stopped Login Service.
Jan  5 01:48:46 my-instance systemd[1]: Condition check resulted in Load Kernel Module drm being skipped.
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Start request repeated too quickly.
Jan  5 01:48:46 my-instance systemd[1]: systemd-logind.service: Failed with result 'resources'.
Jan  5 01:48:46 my-instance systemd[1]: Failed to start Login Service.

Thanks in advance

  • Your instance is failing to startup because you are out of disk space. – John Hanley Jan 05 '22 at 03:10
  • Hi @JohnHanley, thank you for your comment, the problem here is that I resized disk size but I still can't access the VM, have you ever run into this problem? Thanks in advance! – Caio César P. Ricciuti Jan 05 '22 at 04:24
  • 2
    Yes, I have. There are three common possibilities: a) the OS did not resize the disk partition and the file system (Ubuntu images have this support); b) you have a file system error that requires running **fsck** manually; c) you have a disk format that does not support resizing. Your question does not have the details to know which one. I would attach the disk to another Ubuntu VM to debug the problem. – John Hanley Jan 05 '22 at 04:28
  • 1
    If there is no free space, the system did not initialize completely. This means you cannot install packages (no space) and probably the network is not running either. – John Hanley Jan 05 '22 at 04:30

1 Answers1

2

If you already tried resizing your disk on console and same error still persist, the changes must have not applied on the OS level. Some operating systems will automatically resize your partition on reboot, eliminating the need for you to use utilities like fdisk, resize2fs, or xfs growfs. You can try the following command and add it on your startup script to fix this:

For Debian/Ubuntu:

$ sudo apt install -y cloud-utils 
$ sudo apt install -y cloud-guest-utils 
$ sudo growpart /dev/sda 1 
$ sudo resize2fs /dev/sda1

For RedHat/Fedora/CentOS:

$ sudo dnf install -y cloud-utils-growpart
$ sudo growpart /dev/sda 1
$ sudo xfs_growfs -d / 

If you only need the data inside the instance, you can also mount your disk to a new instance and get the data from there.

Alex G
  • 1,179
  • 3
  • 15
  • thank you very much for your input. I believe the solution you present would work but I just saw the following on logs ```GCEMetadataScripts: startup-script mkdir /tmp/metadata-scripts266425768: no space left on device``` it means that the script is not running due lack of disk space`, any ideas? – Caio César P. Ricciuti Jan 05 '22 at 04:30
  • 1
    Your current script needs to create a file to be able to be executed. You have to put the script directly to the Value. Set 'Key' to 'Startup-script' and set 'Value' to (sample) `#! /bin/bash sudo growpart /dev/sda 1 sudo resize2fs /dev/sda1` on your Instance configuration. – Alex G Jan 05 '22 at 07:12
  • Yes, it's done but the log says there is no space left on device to create the file for the script... I'm not sure how to proceed... thank you for your input and help – Caio César P. Ricciuti Jan 05 '22 at 12:28
  • Following your instructions and @John Hanley it works. I've add the `"broken"` disk to a new instance and run the commands you suggested. Thank you very much both of you! – Caio César P. Ricciuti Jan 05 '22 at 18:19