3

I spend the last 5 hours or so trying to get TF 2.0 keras API working with the tf.lookup API. My training script also uses DataBricks and mlflow.keras. MLFlow requires that the model be serialized, which I think is what is causing issues for me. The question is: how to use tf.lookup tables with TensorFlow 2.0 keras Model API and MLFlow.

I was getting keras issues with serialization when trying to use the functional Keras API with table.lookup directly:

table = tf.lookup.StaticVocabularyTable(tf.lookup.TextFileInitializer(vocab_path, tf.string, 0, tf.int64, 1, delimiter=","), 1)
categorical_indices = table.lookup(categorical_input)

Wrapping the above call in a tf.keras.layers.Lambda layer didn't help. I was getting errors related to resource handles or missing tf variable...

Nicholas Leonard
  • 2,566
  • 4
  • 28
  • 32
  • This isn't really a question, is it ? Consider editing your post to include only a well formulated question and feel free to post and accept your own answer. – Mat Oct 22 '19 at 15:37

1 Answers1

5

Sharing the solution here to save somebody else some pain. This the solution that I found to work:

vocab_path = os.path.join(mount_point, 'category_vocab.csv')

class VocabLookup(layers.Layer):
  def __init__(self, vocab_path, num_oov_buckets, **kwargs):
    self.vocab_path = vocab_path
    self.num_oov_buckets = num_oov_buckets
    super(VocabLookup, self).__init__(**kwargs)

  def build(self, input_shape):

    vocab_initializer = tf.lookup.TextFileInitializer(
      self.vocab_path, tf.string, 0, tf.int64, 1, delimiter=",")
    self.table = tf.lookup.StaticVocabularyTable(vocab_initializer, self.num_oov_buckets)

    self.built = True

  def call(self, inputs):
    return self.table.lookup(inputs)

  def get_config(self):
    return {'vocab_path': self.vocab_path, 'num_oov_buckets': self.num_oov_buckets}

lookup_table = VocabLookup(vocab_path, 1)

categorical_indices = lookup_table(categorical_input)

Basically, don't use layers.Lambda if you are referring to any outside variables (include tf or tensorflow module). For example, this doesn't work for me:

def reduce_sum(x):
  return tf.reduce_sum(x, axis=1)

embedding_sum = layers.Lambda(reduce_sum)

categorical_features = embedding_sum(categorical_embeddings)

But this works:

class ReduceSum(layers.Layer):
  def call(self, inputs):
    return tf.reduce_sum(inputs, axis=1)

embedding_sum = ReduceSum()
categorical_features = embedding_sum(categorical_embeddings)

The layers.Lambda doesn't seem to like upvalues.

Nicholas Leonard
  • 2,566
  • 4
  • 28
  • 32