Tensorflow access tensor.numpy() in .map function but using py_function slows down iterator generation

Question

I want to one hot encoder a tensor with my own one hot encoder. For this, I have to call tf.keras.backend.get_value() in .map which is only possible when using tf.py_function:

def one_hot_encode(categories,input):
  encoded_input = []
  data = tf.keras.backend.get_value(input)
  for category in categories:
    encoded_input.append(data==category)
  return np.array(encoded_input)

The problem is, when mapping the dataset and calling one_hot_encode:

ds = ds.map(lambda input, target: (input, tf.py_function(one_hot_encode,inp=[[1,2,3,4,5,6,7,8,9,10],target], Tout=tf.float32)))
ds = ds.map(lambda input, target: (input, tf.reshape(target, (10,))))

tensorflow will take forever to create an Iterator for this dataset e.g. when trying to access the data in a for loop:

for (input, target) in dataset:
 ...

But if I use tensorflows build in one hot encoder, everything works fine and tensorflow is fast.

ds = ds.map(lambda input, target: (input, tf.one_hot(target,10)))
ds = ds.map(lambda input, target: (input, tf.reshape(target, (10,))))

In both approaches, the dataset and all tensors have the same shape. Does anyone know of another method to access the value of a tensor in .map or why tensorflow becomes so slow?

what is the shape of your input data (inputs, labels)? And what exactly is your goal? — AloneTogether, Nov 13 '21 at 12:27
I am using the genomics_ood dataset from tensorflow. My goal is to one hot encode the genome sequence. There are 4 characters, (A,C,G,T) and each sequence has 250 characters, so the one hot encoded tensor will have the shape (1000,) and labels has the shape (10,). If I use tensorflows build in one_hot, everything works fine, but if I use my own one_hot (all values and shapes match in both cases) with py_function, tensorflow becomes really slow. And because I want to/have to do the one hot encoder myself, I can't use the build in function. — Quasi, Nov 13 '21 at 16:47
I edited the original post. For simplicity, I didn't one hot encode the sequence but rather the labels which can be numbers from 1 to 10. But this doesn't change the output behaviour — Quasi, Nov 13 '21 at 17:35
Sorry, Im currently busy and have limited time for the project. Im probably gonna get back to it in 2 or a few days or somewhat. I'm gonna let you know if it worked :). Thx for the answer — Quasi, Nov 20 '21 at 14:51

Tensorflow access tensor.numpy() in .map function but using py_function slows down iterator generation

0 Answers0