0

I'm using the tf.data.Dataset.from_generator() function to create a datset for ASR with audio wav_file, length of audio wav_file, transcript and transcript_len. For the ML model I need audio wav_file and length to be zero padded and therefore I used .padded_batch() already. Now I need something else than .batch() as this needs the tensors to be in the same shape but without zero padding to batch my dataset.

I want to use the CTC Loss function tf.nn.ctc_loss_v2 which needs transcript and transcript_len tensors not be padded with zeros but batched. Is there a possibility to batch a dataset with tensors included in different shapes?


def generate_values():
    for _, row in df.iterrows():
       yield row.wav_filename, row.transcript, len(row.transcript) 

def entry_to_features(wav_filename, transcript, transcript_len):
    features, features_len = audiofile_to_features(wav_filename)
    return features, features_len, transcript, transcript_len

def batch_fn(features, features_len, transcripts, transcript_len):        
    features = tf.data.Dataset.zip((features, features_len))
    features = features.padded_batch(batch_size,
                         padded_shapes=([None, Config.n_input], []))
    trans=tf.data.Dataset.zip((transcripts, 
                     transcript_len)).batch(batch_size) ###PROBLEM: 
                     #### ONLY WORKING WITH BATCH_SIZE=1
    return tf.data.Dataset.zip((features, trans)) 

dataset = tf.data.Dataset.from_generator(generate_values,
                         output_types=(tf.string,tf.int64, tf.int64))
dataset= dataset.map(entry_to_features)
dataset= dataset.window(batch_size, drop_remainder=True)
dataset= dataset.flat_map(batch_fn)

InvalidArgumentError (see above for traceback): Cannot batch tensors with different shapes in component 0. First element had shape [36] and element 2 had shape [34]

  • You might want to try transforming your wav data so the dimensions line up. A nice way of doing this is a [mel spectrogram](https://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.html). Librosa is a python library that has this feature. – o-90 Aug 02 '19 at 15:53
  • I don't understand why you cannot use `padded_batch` there. [`tf.nn.ctc_loss_v2`](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss_v2) takes length parameters, so you can pass padded tensors and the length and the calculation will be correct. In fact, the function _assumes_ you will pass a padded tensor. – jdehesa Aug 02 '19 at 16:44
  • oh wow, sorry! you're right! – Malena Reiners Aug 02 '19 at 19:03

1 Answers1

1

If you want to train a seq2seq model and use features, transcript as training examples dataset.window is not what you gonna use.

dataset = tf.data.Dataset.from_generator(generate_values,
                         output_types=(tf.string, tf.int64, tf.int64))
dataset = dataset.map(entry_to_features)
dataset = dataset.padded_batch(batch_size, padded_shapes=([None, Config.n_input], [], [None], []))

later you can use the dataset as follows:

for features, feature_length, labels, label_length in dataset.take(30): 
    logits, logit_length = model(features, feature_length)
    loss = tf.nn.ctc_loss_v2(labels, tf.cast(logits, tf.float32), 
                             label_length, logit_length, logits_time_major=False)
alexey
  • 706
  • 5
  • 9