1

I have set up and have been using a Google Cloud Platform virtual machine. The browser ssh (a browser tab imitating a console) used to work fine. Yesterday, the sites and API hosted on the machine became unreachable.

The GCP console dashboard shows the machine to be up and running. However, the ssh (which used to work before), is not working. On click, it opens a new window, displays messages of transferring ssh keys ..., and then ultimately shows - An error occurred while communicating with the SSH server. Check the server and the network configuration.

After numerous such attempts, I restarted the VM instance from the GCP console, and everything started working again. But in less than a few hours of restart, the VM has again become reachable. Also, note that (1) the ping gives positive reply, (2) Console shows VM to be running normally, (3) All web pages, and API hosted on the machine are returning 408 (Request time out), (4) Restart cures the problem for a short while, (5) ssh through gcloud also not working, indefinitely waiting for connection with the server.

Since google support is only available on a paid basis, I am stuck. Any help would be deeply appreciated.

Deniss T.
  • 2,526
  • 9
  • 21
Abhishek Prabhat
  • 917
  • 1
  • 6
  • 15
  • Can you share the [serial port output](https://cloud.google.com/compute/docs/instances/viewing-serial-port-output#viewing_serial_port_output)? There may be an error message on your instance visible there. – rsalinas Dec 04 '19 at 12:28
  • Can you paste your firewall rules configuration? – guillaume blaquiere Dec 04 '19 at 13:24
  • First, stop using the Browser based SSH. This is a great feature, but invest in setting up good quality SSH tools (most are free). For Windows I use OpenSSH and Bitvise. Putty is famous. Plain old `ssh` works just fine on macOS and Linux. For example, with Bitvise I get a good file transfer window, I can easily open multiple terminal sessions, etc. – John Hanley Dec 04 '19 at 15:26
  • There are several things that can prevent you from connecting via ssh. Try using the following command in Cloud Shell (just remember to put your vm details) gcloud compute ssh example-instance --zone us-central1-a --verbosity=debug You could also look at the serial port 1 logs and look for things like: "no space left on device", "fail", "error", etc – grimmjow_sms Dec 04 '19 at 17:37
  • @all Sorry for the late response. I will have to figure out how to receive email notification from StackOverflow. The serial port output is `Dec 5 05:26:37 staging-1 dockerd[5089]: time="2019-12-05T05:26:37.720999095Z" level=error msg="Failed to log msg \"\" for logger json-file: write /var/lib/docker/containers/cc2...36a3-json.log: no space left on device"` . Apparently, to rule out the space shortage, I did try `df` and it showed ample free space. However, as I am checking now, it shows no available space. Thanks – Abhishek Prabhat Dec 05 '19 at 05:54
  • @JohnHanley I am indeed more comfortable running an ssh session through my OS terminal. However, if I add my pub key to `.ssh/authorized_keys', the ssh stops working after a while. Apparently, GCP auto resets the keys. If, however, I add the keys through the GCP console, the ssh through my terminal doesn't work. Also, I faced the problem with terminal login, of not being able to become super user. Would it require to create a privileged user through browser ssh, and then use that user for terminal ssh? At any rate, the problem of auto key deletion by GCP still remains for me. Thanks. – Abhishek Prabhat Dec 05 '19 at 06:06
  • 1
    @grimmjow_sms Thanks about the `verbosity` flag. I have cleaned up some space in the system already. But it might come handy the next time. – Abhishek Prabhat Dec 05 '19 at 06:11
  • Disk space is cheap, resize the disk drive in the Google Cloud Console. The file system will automatically resize. Running out of disk space can trash your OS. – John Hanley Dec 05 '19 at 06:12
  • I have many keys in `authorized_keys` and I have never seen your problem. I build dozens of instances constantly. And I use the Console SSH, Bitvise, ssh and Putty interchangeably. Something else is going on. Note: Running out of disk space will break networking which will break SSH but that is not related to your key problem. – John Hanley Dec 05 '19 at 06:17

2 Answers2

1

One other reason for not being able to ssh into a machine is because you are connected to a VPN network locally. This can prevent sshing to a gcp vm instance.

Disable local VPN connection and try again.

Riyafa Abdul Hameed
  • 7,417
  • 6
  • 40
  • 55
0

As your issue seems to be more related with the OS, you may want to try and connect through the serial port as described here, however, I would say that a faster and more reliable way to solve this would be to simple increase the disk size of your instance as what you have assigned right now may not be enough for the operations you are running.

Additionally, you may find more help with this over at this answer which has a very complete rundown on what you may do in these cases.

rsalinas
  • 1,507
  • 8
  • 9
  • The disk size is 80GB, with ubuntu 18.04, running few (10+) docker containers and a mongo db, which I thought should not suffer because of low disk space. It seems the sudden eating up of space may be due excessive error log (failed kafka reading in a loop) from one of the micro-services. At the next ssh failure I shall try the routes suggested. Thanks. – Abhishek Prabhat Dec 06 '19 at 12:14