Audio resampling layer for tensorflow

Question

It is required to resample audio signals within a custom model structure. This resampling task is not a kind of pre/post-processing operation that can be developed out of the model. In other words, this resampling is a section of model's internal design. Then, it is required to define the gradient operation for such a layer as well. For the resampling operation, it is going to employ tensorflow I/O:

tfio.audio.resample

The operation works perfectly and can be easily used as a pre/post-processing unit; however, its implementation a a custom layer being embedding within the model is challenging as I don't know how to implement the backward path.

How the backward path should be implemented for such a 1D signal resampling layer?
Is there any other open source 1D signal resampling layer that be employed?

P.S., I tried to employ conventional upsampling/pooling like layers, but not accurate enough comparing the tfio which implements other resampling methods like FFT-based.

To give more understanding, please have a look at: another question

score 0 · Answer 1 · edited May 12 '22 at 08:06

You must tell the objective of re-samplings, it can be done in many ways including concluding sing signals then you can represent with smaller sizes of sine values.

By changing of the samplig rate you can save the DATA space 0.05 * tf.math.sin(audio[:5 * 22050]).numpy()

sec_1 = np.zeros((2750)) * tf.math.sin(audio[0:2750]).numpy() and

sec_2 = np.ones((2750)) * tf.math.sin(audio[2750:5500]).numpy()

[ Sample ]:

import numpy as np
import tensorflow as tf

import matplotlib.pyplot as plt

contents = tf.io.read_file("F:\\temp\\Python\\Speech\\temple_of_love-sisters_of_mercy.wav")
audio, sample_rate = tf.audio.decode_wav(
    contents, desired_channels=-1, desired_samples=-1, name=None
)

print(audio)
print(sample_rate)

plt.plot(audio[:5 * 22050])
plt.show()
plt.close()

plt.plot(0.05 * tf.math.sin(audio[:5 * 22050]).numpy())
plt.show()
plt.close()

sec_1 = np.zeros((2750)) * tf.math.sin(audio[0:2750]).numpy()
sec_2 = np.ones((2750)) * tf.math.sin(audio[2750:5500]).numpy()


plt.plot(0.05 * tf.concat([sec_1, sec_2], 0).numpy())
plt.show()
plt.close()

[ Output ]:

array([[0.],
       [0.],
       [0.],
       ...,
       [0.],
       [0.],
       [0.]], dtype=float32)>, sample_rate=<tf.Tensor: shape=(), dtype=int32, numpy=22050>)

tf.Tensor(22050, shape=(), dtype=int32)

Thank you very much, the input is a batch of 1D audio signals containing speech data, then, the resampling is referred to changing the "sample rate" in the network, (e.g., 16k to 8k). However, the principle question would be the gradient of such an operation (in a form of a tf layer) to propagate back the error to its input. In the question, a link to a pooling layer and the mathematics behind the propagation is well-addressed. In summary, I'm looking for a layer to change the sampling rate dynamically withing the network structure, while respecting the back-propagation rules. — ir0098, Mar 29 '22 at 19:53

Audio resampling layer for tensorflow

1 Answers1