1

I want to one hot encoder a tensor with my own one hot encoder. For this, I have to call tf.keras.backend.get_value() in .map which is only possible when using tf.py_function:

def one_hot_encode(categories,input):
  encoded_input = []
  data = tf.keras.backend.get_value(input)
  for category in categories:
    encoded_input.append(data==category)
  return np.array(encoded_input)

The problem is, when mapping the dataset and calling one_hot_encode:

ds = ds.map(lambda input, target: (input, tf.py_function(one_hot_encode,inp=[[1,2,3,4,5,6,7,8,9,10],target], Tout=tf.float32)))
ds = ds.map(lambda input, target: (input, tf.reshape(target, (10,))))

tensorflow will take forever to create an Iterator for this dataset e.g. when trying to access the data in a for loop:

for (input, target) in dataset:
 ...

enter image description here

But if I use tensorflows build in one hot encoder, everything works fine and tensorflow is fast.

ds = ds.map(lambda input, target: (input, tf.one_hot(target,10)))
ds = ds.map(lambda input, target: (input, tf.reshape(target, (10,))))

In both approaches, the dataset and all tensors have the same shape. Does anyone know of another method to access the value of a tensor in .map or why tensorflow becomes so slow?

Quasi
  • 576
  • 4
  • 13
  • what is the shape of your input data (inputs, labels)? And what exactly is your goal? – AloneTogether Nov 13 '21 at 12:27
  • I am using the genomics_ood dataset from tensorflow. My goal is to one hot encode the genome sequence. There are 4 characters, (A,C,G,T) and each sequence has 250 characters, so the one hot encoded tensor will have the shape (1000,) and labels has the shape (10,). If I use tensorflows build in one_hot, everything works fine, but if I use my own one_hot (all values and shapes match in both cases) with py_function, tensorflow becomes really slow. And because I want to/have to do the one hot encoder myself, I can't use the build in function. – Quasi Nov 13 '21 at 16:47
  • can you show how you have implemented your one hot encoder? – AloneTogether Nov 13 '21 at 16:48
  • 1
    I edited the original post. For simplicity, I didn't one hot encode the sequence but rather the labels which can be numbers from 1 to 10. But this doesn't change the output behaviour – Quasi Nov 13 '21 at 17:35
  • Sorry, Im currently busy and have limited time for the project. Im probably gonna get back to it in 2 or a few days or somewhat. I'm gonna let you know if it worked :). Thx for the answer – Quasi Nov 20 '21 at 14:51

0 Answers0