2

I'm trying to read a csv_file from Google Storage Cloud to Google Cloud Datalab exactly like suggested in here.

I keep getting the error: Source object gs://analog-arbor-233411/traissn.csv does not exist. (analog-arbor-233411 is my bucket name, traissn.csv is my csv file.

So here I checked that the bucket really exists, and it does.

import google.datalab.storage as storage
mybucket = storage.Bucket('analog-arbor-233411')
mybucket.exists()

Here I even iterate through the mybucket.objects() which gives an iterator for the objects within the bucket to make sure that I get an existing object. So data_csv_meta only takes the last object in the iteration. Then I checked again if it exists, and surely it does!

for i in mybucket.objects():
    data_csv = i
data_csv.exists()

Here is a funny thing. When I run the following, I get the error Source object gs://analog-arbor-233411/traissn.csv does not exist (my object name in data_csv traissn.csv)

uri = data_csv.uri
%gcs read --object $uri --variable data

Tried looking everywhere, but can't get an answer.

Nazim Kerimbekov
  • 4,712
  • 8
  • 34
  • 58
Leonardus
  • 21
  • 1

1 Answers1

1

In your current code data_csv.exists() is called outside of the for loop, so it returns the result for only the last data_csv object returned by the bucket iterator, which may or may not be traissn.csv.

So either:

  • inside the for loop add a break statement if data_csv points to traissn.csv, so that data_csv remains unchanged
  • make the gcs call inside the for loop
Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97
  • `data_csv.exists()` is being intentionally called outside the `for` to make sure whatever the last object returned by the bucket iterator (which is stored in `data_csv` after the loop ends) exists - and it must exist because if it is returned by the bucket iterator then it exists! Notice I never specified `traissn.csv` anywhere in the code, it is simply the last object the bucket iterator returned. Perhaps it's also worth to mention that `traissn.csv` really exists in the Google Cloud Storage. Only after I make the `gcs` call I get the error saying it doesn't exist. – Leonardus Mar 11 '19 at 12:04
  • Is the `traissn.csv` object readable by the script? (often wording in error messages can be misleading) – Dan Cornilescu Mar 11 '19 at 12:10