How do you pass vocabulary text files to a model and train it on google cloud platforms machine learning engine

Question

I am trying to make a DNNCLassifier that takes categorical inputs using tensor flow to train a model on the Google Cloud Platform (GCP). I have a few categorical feature columns where I use a vocabulary.txt file. For example:

tf.feature_column.categorical_column_with_vocabulary_file(
         key = "feature_name", 
         vocabulary_file = vocab_file,
         vocabulary_size = vocab_size
         ),

I spent several frustrating hours discovering that you can't use open() in GCP because it can't handle the gs://. Therefore, I used the following code to read in vocabulary files:

def read_vocab_file(file_path):   
"""Reads a vocab file to memeory.    
  Args:
    file_path: path to Vocab file in cloud storage bucket

  Returns:
    Vocab list, the size of the vocabulary   """   

  with file_io.FileIO(file_path, 'r') as f:
    #vocab_lines = open(f,'r').readlines()
    vocab_lines = f.readlines()
    vocab_size = len(vocab_lines)

  return vocab_lines, vocab_size

This allows me to submit a training job where I pass the path to the vocabulary files as an argument.

gcloud ml-engine jobs submit training $JOB_NAME \ --job-dir $MODEL_DIR \ --runtime-version 1.4 \ --module-name trainer.task \ --package-path trainer/ \ --region $REGION \ -- \ --train-files $TRAIN_DATA \ --eval-files $EVAL_DATA \ --vocab-paths $VOCAB\ --latlon-data-paths $LATLON\ --train-steps 1000 \ --eval-steps 100

This works fine for training, but then I am not able to make predictions. Is there a better way to train a model in the google cloud machine learning engine environment while using vocab.txt files to create categorical feature columns?

Any example code that uses categorical features with a tf.estimator.DNNCLassifier would be greatly appreciated especially if it can run on GCP with hyperparameter optimization and make predictions in the cloud.

Thank you

I realized everything worked fine. The issue was that I could only make predictions in GCP by passing .json files. When I tried to make predictions with a text file I just got "Predicition Error: Unknown Error" and I assumed it was do to the vocab files. — Quintin Sheridan, May 25 '18 at 17:34

How do you pass vocabulary text files to a model and train it on google cloud platforms machine learning engine

0 Answers0