How to get a spectrogram offline with the right shape as an input to recognize()?

Question

I am trying to perform offline recognition with my own trained model according to this doc: https://github.com/tensorflow/tfjs-models/tree/master/speech-commands

I had the same issue as https://github.com/tensorflow/tfjs/issues/3820 described, and I had tried all solutions suggested from there, including the colab (preprocessing model)support https://colab.research.google.com/github/tensorflow/tfjs-models/blob/master/speech-commands/training/browser-fft/training_custom_audio_model_in_python.ipynb#scrollTo=1AjdTru5NnQQ which worked fine with its given wav files but got an array of NaN values when using my own wav files：

filepath = '/my/own/file.wav'
file_contents = tf.io.read_file(filepath)
wavform = tf.expand_dims(tf.squeeze(tf.audio.decode_wav(
      file_contents, 
      desired_channels=-1,
      desired_samples=TARGET_SAMPLE_RATE).audio, axis=-1), 0)
    cropped_waveform = tf.slice(waveform, begin=[0, 0], size=[1, EXPECTED_WAVEFORM_LEN])    
    spectrogram = tf.squeeze(preproc_model(cropped_waveform), axis=0)
print(spectrogram)


Output:

tf.Tensor(
[[[nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
   ...
   [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]]], shape=(43, 232, 1), dtype=float32)

Is there a way to solve this problem?

For instance, should I modify my wav files data according to the given wav files? But how? Did I miss some important steps during the preprocessing procedure while handling my own wav files? Or is there a simpler way to achieve this in javascript instead of in python?

score 1 · Accepted Answer · answered Sep 08 '21 at 04:00

1

Your problem is identical to the github issue https://github.com/tensorflow/tfjs/issues/3820.

Can you check if your input tensor of preproc_model() contains a lot of zero entries? I think it's these zero entries that cause the "nan" problem.

answered Sep 08 '21 at 04:00

EchoShao

56
5

thanks! I've already solved the problem but just haven't got time to come up with a detailed explanation and solution – Jia Li Sep 08 '21 at 08:26
That's great! preproc_model() seems to have some problem with input data containing loads of zero entries which may happen if the recorder has some sorts of latency. If this is your case, adding small random noise to the input may help. (Just leave some comments here in case someone need the information.) – EchoShao Sep 09 '21 at 03:26

How to get a spectrogram offline with the right shape as an input to recognize()?

1 Answers1