I am using Google Cloud to train a neural network on the cloud like in the following example:
To start I set the following to environmental variables:
PROJECT_ID=$(gcloud config list project --format "value(core.project)")
BUCKET_NAME=${PROJECT_ID}-mlengine
I then uploaded my training and evaluation data, both csv's with the names eval_set.csv and train_set.csv to Google cloud storage with the following command:
gsutil cp -r data gs://$BUCKET_NAME
I then verified that these two csv files where in the polar-terminal-160506-mlengine/data directory on my Google Cloud storage.
I then did the following environmental variable assignments
# Assign appropriate values.
PROJECT=$(gcloud config list project --format "value(core.project)")
JOB_ID="flowers_${USER}_$(date +%Y%m%d_%H%M%S)"
GCS_PATH="${BUCKET}/${USER}/${JOB_ID}"
DICT_FILE=gs://cloud-ml-data/img/flower_photos/dict.txt
Before trying to preprocess my evaluation data like so:
# Preprocess the eval set.
python trainer/preprocess.py \
--input_dict "$DICT_FILE" \
--input_path "gs://cloud-ml-data/img/flower_photos/eval_set.csv" \
--output_path "${GCS_PATH}/preproc/eval" \
--cloud
Sadly, this runs for a bit and then crashes outputting the following error:
ValueError: Unable to get the Filesystem for path gs://polar-terminal-160506-mlengine/data/eval_set.csv
This doesn't seem possible as I have confirmed with my eyes via my Google Cloud Storage console that eval_set.csv is stored at this location. Is this perhaps a permissions issue or something I am not seeing?
Edit:
I have found the cause of this run time error to be from a certain line in the trainer.preprocess.py file. The line is this one:
read_input_source = beam.io.ReadFromText(
opt.input_path, strip_trailing_newlines=True)
Seems like a pretty good clue but I am still not really sure what is going on. When I google "beam.io.ReadFromText ValueError: Unable to get the Filesystem for path" nothing relevant at all appears which is a bit odd. Thoughts?