2

I have the log mel spectrograms of a few audio clips and I am trying to augment the spectrograms using tfa.image.sparse_image_warp so that time warping can be achieved as done in Google's SpecAugment.

But I am confused on how to do achieve time warping as the documentation does not specify how to initialize arguments to sparse_image_warp.

The method declaration is like this:

tfa.image.sparse_image_warp(
image: tfa.types.TensorLike,
source_control_point_locations: tfa.types.TensorLike,
dest_control_point_locations: tfa.types.TensorLike,
interpolation_order: int = 2,
regularization_weight: tfa.types.FloatTensorLike = 0.0,
num_boundary_points: int = 0,
name: str = 'sparse_image_warp') -> tf.Tensor

Can someone point out how to initialize source_control_point_locations, dest_control_point_locations and num_boundary_points?

VITTHAL BHANDARI
  • 139
  • 2
  • 5
  • 15

1 Answers1

1

I think I can answer this because we are reading the same paper.
Please notice that though I managed to make the code work, I do not fully understand the theory of warping.


On the surface of my shallow understanding, warping is to transform a pixel to another position.

Therefore,source_control_point_locations specifies the source pixel while dest_control_point_locations corresponding to the target position. They are coordinates, hence the shape [batch_size, num_control_points, 2]

I don't know exactly how num_boundary_points works. What I know is that if you don't fix the boundary points, you might get the error "Input matrix is not invertible. [Op:MatrixSolve]." (It seems that after transformation, some sort of interpolation will be performed, hence the matrix operation.)


import tensorflow_addons as tfa
import tensorflow as tf
import numpy as np

mspec = np.random.randn(128, 501).astype("float32")   # spectrogram
src = tf.Variable([[[64, 1]]], dtype = float)         # 64 because center freq
dst = tf.Variable([[[64, 50]]], dtype = float)        # switch pixel from timestep 1 to 50
warped = tfa.image.sparse_image_warp(mspec, src, dst, num_boundary_points = 2)
ChrisQIU
  • 306
  • 2
  • 6