0

I have run a Natural Language AutoML Text Classification model on google cloud platform. The data I used to train the model can be exported to a csv file in a bucket. The file has 3 columns (1. train/validation/test, 2. gs:// reference to the actual content, 3. prediction result/label) as in the below example.

TRAIN gs://[bucket_name]/[folder]/uploads/content/RrpGCDwgse0.txt Website

My question is how to get the actual content out of the gs:// reference so I can look at each row to determine if the predicted labels are correct or not. The output should be in a csv file with the string content, not the gs:// reference.

Daisy Yu
  • 1
  • 2

1 Answers1

0

This bucket would have been created by someone before using AutoML to store the documents used to train your model. So if you want access to the objects within the bucket, someone with the correct permissions will need to give you access to the bucket or individual objects within the bucket. The gs:// reference is the object itself.

There's quite a few options available for this:

  1. Cloud Identity and Access Management (Cloud IAM) permissions to grant access to buckets and bulk access to objects in the bucket.
  2. Access Control Lists (ACLs) to grant read or write access to users for individual buckets or objects
  3. Signed URLs (query string authentication) to give time-limited read or write access to an object through a URL you generate
Corinne White
  • 426
  • 2
  • 8
  • I am the owner and have the all the access. My goal is to replace the gs:// reference with the string content in my cvs output file. – Daisy Yu Jun 05 '19 at 18:27
  • You could set up a Cloud Function that gets [triggered when a new file](https://cloud.google.com/functions/docs/calling/storage) is created and have it manipulate the CSV file using one of the Storage [client libraries](https://cloud.google.com/storage/docs/reference/libraries), replacing the gs:// reference with the file content and then writing it to Storage again – Corinne White Jun 06 '19 at 08:51
  • Thanks Corinne but I was not able to find the library that can replace the gs:// reference with string content... – Daisy Yu Jun 06 '19 at 15:31
  • The Cloud Function would involve some coding and the method would depend on whatever programming language you're familiar with. Can I just confirm something, since I'm not that familiar with AutoML, doesn't the UI provide the functionality you need? It's my understanding that you can see the labels applied to the test data in the console itself. – Corinne White Jun 07 '19 at 09:53
  • No, the UI does not provide the functionality I need. The solution I finally had was to convert a json file into table format using python which was quite some work but it worked. Thanks Corinne. – Daisy Yu Jun 17 '19 at 19:25