0

When I run my python command to train my model on my tpu-vm, it failed on writing files to Cloud Storage.

Traceback (most recent call last):
File "device_train.py", line 302, in <module>
  save(network, step, bucket, model_dir,
File "device_train.py", line 62, in save
  with open(f"gs://{bucket}/{path}/meta.json", "w") as f:
File "/usr/local/lib/python3.8/dist-packages/smart_open/smart_open_lib.py", line 235, in open
  binary = _open_binary_stream(uri, binary_mode, transport_params)
File "/usr/local/lib/python3.8/dist-packages/smart_open/smart_open_lib.py", line 398, in _open_binary_stream
  fobj = submodule.open_uri(uri, mode, transport_params)
File "/usr/local/lib/python3.8/dist-packages/smart_open/gcs.py", line 105, in open_uri
  return open(parsed_uri['bucket_id'], parsed_uri['blob_id'], mode, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/smart_open/gcs.py", line 146, in open
  fileobj = Writer(
File "/usr/local/lib/python3.8/dist-packages/smart_open/gcs.py", line 427, in __init__
  self._resumable_upload_url = self._blob.create_resumable_upload_session()
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2728, in create_resumable_upload_session
  _raise_from_invalid_response(exc)
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 3936, in _raise_from_invalid_response
  raise exceptions.from_http_status(response.status_code, message, response=response)
google.api_core.exceptions.Forbidden: 403 POST https://storage.googleapis.com/upload/storage/v1/b/my-bucket/o?uploadType=resumable: {
"error": {
  "code": 403,
  "message": "Access denied.",
  "errors": [
    {
      "message": "Access denied.",
      "domain": "global",
      "reason": "forbidden"
    }
  ]
}
}
: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.CREATED: 201>)


I have two service accounts, one is like project-id-compute@developer.gserviceaccount.com, and the other is like service-project-id@cloud-tpu.iam.gserviceaccount.com.
I try to add the storage admin to both of them. But it doesn't work. Looking for your help!

csliu_jia
  • 1
  • 1
  • Is your bucket in the same zone as that of the VM running the training? or Is it multi-region? – aman2930 Mar 29 '22 at 18:23
  • Please provide enough code so others can better understand or reproduce the problem. – Community Apr 01 '22 at 02:20
  • Thanks. The bucket and the VM are in the same zone. – csliu_jia Apr 06 '22 at 06:52
  • Recently, I have resolved the problem by add the service account `key` to my code. But I still do not know why my VM cannot connect the bucket directly, as I have set the permission for it. – csliu_jia Apr 06 '22 at 06:56

0 Answers0