3

One of our VM instances in Google Cloud Console was stopped, and we can't figure out why. There are 4 similar compute.instances.stop logs listed in the stackdriver logs. This is one of them:

jsonPayload: {
  actor: {
   user:  "cloud-cluster-manager@prod.google.com"    
  }
  event_subtype:  "compute.instances.stop"   
  event_timestamp_us:  "1549644158637334"   
  event_type:  "GCE_API_CALL"   
  ip_address:  "",
  ...
}

I found the meaning of the event_subtype and event_type in the Docs, but i'm having troubles understanding it? And who is that actor, I can't find him in our IAM Users.

Any idea?

SebSob
  • 552
  • 7
  • 22
  • the actor is the `cloud-cluster-manager` most probably an internal managing application that runs in the background, but I presume the stopping is related to the actual code of your application. Maybe some error that makes the manager kill it or even maybe lack of resources – Nikos M. Feb 11 '19 at 13:37
  • @NikosM. How can i find out if it was due to lack of resources? The incident (log-trace) was 2019-02-08, the previous log was 2019-01-17, which had nothing to do with the reason why it stopped. I can't seem to find the error logs... – SebSob Feb 11 '19 at 14:00
  • I would propose to ask the google support for this, they should have an idea of what killed your VM – Nikos M. Feb 11 '19 at 14:02
  • Thank, but it seems i can't have technical support by phone, because my support package is 'Bronze', you need Gold or Platinium. And there is no support by email/chat. Can't find it. – SebSob Feb 11 '19 at 14:28
  • You can file defect report, someone from GCP team will look at. You can file the defect report from [here](https://issuetracker.google.com/issues/new?component=491456&template=1161077). – Rahi Feb 25 '19 at 03:54
  • Thanks, I just created a defect report, let's hope I get some clarification.. – SebSob Feb 25 '19 at 12:49

1 Answers1

7

I finally find out what happend, and I think it would be useful to share this.

After an internal investigation, Google confirmed that the cloud-cluster-manager@prod.google.com is a GCP managed service account that affects instances due to billing issues.

When I contacted the Cloud Platform Billing team, they told me that cloud-cluster-manager@prod.google.com can stop an instance if the billing account is not in good standing. One of the reasons why the account will not be in good standing are the following:

  1. If there is an unpaid pending balance.
  2. There is an ongoing suspicious activity that appears to violate our Terms of Service.
  3. The account was reported as Fraud.

Finally, the Google Accounting Team confirmed that they made a manual human error, causing the Billing Account to be incorrectly closed. After 10 minutes the Google Engineer had discovered his mistake and he restored it, giving us the opportunity to restart the VM

This was a one-time, exceptional intervention from Google, and there are steps and processes in place to prevent this from going wrong in the future.

SebSob
  • 552
  • 7
  • 22