I am running a TensorFlow application on the Google ML Engine with hyper-parameter tuning and I've been running into some strange authentication issues.
My Data and Permissions Setup
My trainer code supports two ways of obtaining input data for my model:
- Getting a table from BigQuery.
- Reading from a
.csv
file.
For my IAM permissions, I have two members set up:
My user account:
- Assigned to the following IAM roles:
- Project Owner (
roles/owner
) - BigQuery Admin (
roles/bigquery.admin
)
- Project Owner (
- Credentials were created automatically when I used
gcloud auth application-default login
- Assigned to the following IAM roles:
A service account:
- Assigned to the following IAM roles:
- BigQuery Admin (
roles/bigquery.admin
) - Storage Admin (
roles/storage.admin
) - PubSub Admin (
roles/pubsub.admin
)
- BigQuery Admin (
- Credentials were downloaded to a
.json
file when I created it in the Google Cloud Platform interface.
- Assigned to the following IAM roles:
The Problem
When I run my trainer code on the Google ML Engine using my user account credentials and reading from a .csv
file, everything works fine.
However, if I try to get my data from BigQuery, I get the following error:
Forbidden: 403 Insufficient Permission (GET https://www.googleapis.com/bigquery/v2/projects/MY-PROJECT-ID/datasets/MY-DATASET-ID/tables/MY-TABLE-NAME)
This is the reason why I created a service account, but the service account has a separate set of issues. When using the service account, I am able to read from both a .csv
file and from BigQuery, but in both cases, I get the following error at the end of each trial:
Unable to log objective metric due to exception <HttpError 403 when requesting https://pubsub.googleapis.com/v1/projects/MY-PROJECT-ID/topics/ml_MY-JOB-ID:publish?alt=json returned "User not authorized to perform this action.">.
This doesn't cause the job to fail, but it prevents the objective metric from being recorded, so the hyper-parameter tuning does not provide any helpful output.
The Question
I'm not sure why I'm getting these permission errors when my IAM members are assigned to what I'm pretty sure are the correct roles.
My trainer code works in every case when I run it locally (although PubSub is obviously not being used when running locally), so I'm fairly certain it's not a bug in the code.
Any suggestions?
Notes
There was one point at which my service account was getting the same error as my user account when trying to access BigQuery. The solution I stumbled upon is a strange one. I decided to remove all roles from my service account and add them again, and this fixed the BigQuery permission issue for that member.