0

I am trying to implement the following paper: https://arxiv.org/abs/1904.08779 in order to achieve better results in Speech to Text.
I am trying to implement it using the mozilla DeepSpeech repo. It uses the tensorflow dataset model to load the data.

dataset = (tf.data.Dataset.from_generator(generate_values,
                                              output_types=(tf.string, (tf.int64, tf.int32, tf.int64),tf.int64))
                              .map(entry_to_features, num_parallel_calls=tf.data.experimental.AUTOTUNE)
                              .cache(cache_path)
                              .map(augment_spec, num_parallel_calls=tf.data.experimental.AUTOTUNE)
                              .window(batch_size, drop_remainder=True).flat_map(batch_fn)
                              .prefetch(num_gpus))

The audio is converter to a spectrogram and mfcc are calculated, so when the data arrives at the augment_spec function it has a shape of (?, 26). ? is the result of a reshape of a variable audio length. I am trying to mask certain parts of the images, to do that I thought of multiplying to tensors, one being a mask of ones and zeros, using some code like this

def augment_spec(features, features_len, transcript):
    # print("\n\n\n\n duration", duration.eval())
    sample_rate = 8000

    mask = np.ones_like(features)

    temp = tf.Variable(tf.ones_like(features))
    print(temp)

    time_len = features_len.shape[0]
    features_len = features_len

    n_time_masks = np.random.randint(0, 4)
    n_freq_masks = np.random.randint(0, 3)

    for _ in range(n_time_masks):
        time_delta = np.random.randint(int(sample_rate / 10), int(sample_rate / 2))
        time_start = np.random.randint(0, time_len - time_delta)
        print(time_start, time_delta)
        mask[time_start:time_start + time_delta] = 0

    for _ in range(n_freq_masks):
        freq_delta = np.random.randint(1, 4)
        freq_start = np.random.randint(0, features_len - freq_delta)
        print(freq_start, freq_delta)
        mask[:, freq_start:freq_start + freq_delta] = 0

    mask = tf.convert_to_tensor(mask, dtype=tf.float32)
    return tf.math.multiply(features, mask),  features_len, transcript

The problem is that these instructions:

    mask = np.ones_like(features)  

    time_len = features_len.shape[0]  

do not work since the when the graph is being built the tensors has not defined shape, so I do not know how to implement this. Could you help me with this? Thanks a lot!!

UPDATE: Following @kempy answer my code now looks like this:

def augment_spec(features, features_len, transcript):

    # print("\n\n\n\n duration", duration.eval())
    sample_rate = 8000

    mask = tf.Variable(tf.ones_like(features),validate_shape=False)

    time_len = tf.shape(features)[0]

    n_time_masks = np.random.randint(0, 4)
    n_freq_masks = np.random.randint(0, 3)
    # n_time_masks = tf.random.uniform(
    #         shape=(), minval=0, maxval=4, dtype=tf.int32)
    # n_freq_masks = tf.random.uniform(
    #         shape=(), minval=0, maxval=3, dtype=tf.int32)

    for _ in range(n_time_masks):

        time_delta = tf.random.uniform(
            shape=(), minval=int(sample_rate / 10), maxval=int(sample_rate / 2), dtype=tf.int32)
        time_start = tf.random.uniform(
            shape=(), minval=0, maxval=time_len-time_delta, dtype=tf.int32)

        # indexes = list(range(time_start,time_start+time_delta))
        indexes = tf.range(time_start, time_start+time_delta, delta=1, dtype=tf.int32, name='range')

        tf.scatter_update(mask, indexes, 0)

    mask = tf.transpose(mask,(1,0))
    for _ in range(n_freq_masks):
        # freq_delta = np.random.randint(1, 4)
        # freq_start = np.random.randint(0, features_len - freq_delta)

        freq_delta = tf.random.uniform(
            shape=(), minval=1, maxval=4, dtype=tf.int32)
        freq_start = tf.random.uniform(
            shape=(), minval=0, maxval=(features_len - freq_delta), dtype=tf.int32)


        # indexes = list(range(freq_start,freq_start+freq_delta))
        indexes = tf.range(freq_start, freq_start+freq_delta, delta=1, dtype=tf.int32, name='range')

        tf.scatter_update(mask, indexes, 0)


    mask = tf.transpose(mask,(1,0))
    mask = tf.convert_to_tensor(mask, dtype=tf.float32)
    masked = tf.multiply(features, mask)
    return masked,  features_len, transcript

But now I am getting this error:

ValueError: Tensor("Variable:0", dtype=float32_ref) must be from the same graph as Tensor("tower_0/Mean:0", shape=(), dtype=float32, device=/device:GPU:0).

I do not know how to solve this, thank you for your help

Kailegh
  • 199
  • 1
  • 13

1 Answers1

0

The short answer

Use tf versions instead of np functions. tf.ones_like should work fine with an input of shape (?, 26) and you can use tf.shape(features)[0] to dynamically get the shape of the features. Further down you should use something like tf.random.uniform

The long answer

When running TF in graph mode (which is the default in TF 1.X), you can't have python code depend out the output of a tensor since it hasn't been executed yet, so you should use TF ops instead of python numpy code.

We can build a graph with dynamic first dimension:

import numpy as np
import tensorflow as tf

# Feature dimensions
unknown_size = 3
feature_dim = 26

tf.reset_default_graph()

# features_input has dynamic first dimension
features_input = tf.placeholder(tf.int32, shape=(None, feature_dim))

# ones_like should work fine with argument of shape (?, 26)
batched_ones = tf.ones_like(features_input)

# dynamically get the shape of the features_input
time_len = tf.shape(features_input)[0]
time_start = tf.random.uniform(
    shape=(), minval=0, maxval=time_len, dtype=tf.int32)

And print the following:

print('features_input.shape:')
print(features_input.shape)
print('batched_ones.shape:')
print(batched_ones.shape)
print('time_start.shape:')
print(time_start.shape)

The output we see is:

features_input.shape:
(?, 26)
batched_ones.shape:
(?, 26)
time_start.shape:
()

If we then try to execute the graph:

with tf.Session() as sess:
  # Create some input data
  features = np.arange(feature_dim)
  batched_features = np.tile(features, (unknown_size, 1))

  # Evaluate the tensors
  features_out, ones_out, time_start_out = sess.run(
      [features_input, batched_ones, time_start],
      feed_dict={features_input: batched_features})

And print the output:

# Print out what the output looks like
print('\nOutput:')
print('\nFeatures:')

print(features_out)
print('shape:', features_out.shape)

print('\nOnes:')
print(ones_out)
print('shape:', ones_out.shape)

print('\nRandom between 0 and unknown_size:')
print(time_start_out)
print('shape:', time_start_out.shape)

We can see that it works!

Output:

Features:
[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24 25]
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24 25]
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24 25]]
shape: (3, 26)

Ones:
[[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]]
shape: (3, 26)

Random between 0 and unknown_size:
0
shape: ()
kempy
  • 606
  • 3
  • 8
  • by doing it like so you can get time_len and time_start, the problem is I cannot assign 0 values to the ones tensor because it is a tensorflow tensor and it does not allow it – Kailegh Apr 26 '19 at 07:37
  • You can use `tf.scatter_update` to update by index. You should generate a list of indices to update and then update them all to 0 with `tf.scatter_update` – kempy Apr 26 '19 at 09:54
  • I have edited the question with the new code. Now I am getting a new error saying tensors come from different graphs – Kailegh Apr 28 '19 at 15:19
  • It looks like the error has something to do with adding ops to different graphs. It references some tensor called `tower_0/Mean` which looks like it's part of the library that you're using. You should make sure your code and the library code that adds ops to the graph run on the same graph - so make sure you don't run `tf.reset_default_graph()` in between – kempy Apr 29 '19 at 09:33