2

Using Python, I am trying to plot the signal amplitude of a wav file, however I am getting the following error "ValueError: x and y must have same first dimension". Here is my code:

import wave
import matplotlib.pyplot as plt
import numpy as np

wav_obj = wave.open("loop.wav", "rb")

sample_freq = wav_obj.getframerate()
n_samples = wav_obj.getnframes()
t_audio = n_samples/sample_freq
n_channels = wav_obj.getnchannels()
signal_wave = wav_obj.readframes(n_samples)

signal_array = np.frombuffer(signal_wave, dtype=np.int16)
l_channel = signal_array[0::2]
r_channel = signal_array[1::2]

times = np.linspace(0, t_audio, num=n_samples)

plt.figure(figsize=(15, 5))
plt.plot(times, l_channel)
plt.title('Left Channel')
plt.ylabel("Signal Value")
plt.xlabel("Time in seconds")
plt.xlim(0, t_audio)
plt.show

I know that the shape of my signal_array should be equal to (n_samples * n_channels), but that is not the case and I don't know why. Right now the shape of signal_array is 1076340, and (n_samples * n_channels) is 717560.

I tried using a different wav file and I got the same error.

UPDATE: I have some more insight, my wav file is stereo, so it has 2 channels. The "signal_array" shape is actually (n_samples * 3) which is because the sample width of the wav file is 3. Therefore the shape of my "l_channel" is actually (times * 1.5).

So my question now is, how do I take into account that my sample width is 3? What should I do to my arrays so that they end up being equal to the shape of my "times" array?

Jay
  • 21
  • 3

1 Answers1

1

This is actually a bit complicated, because a 24-bit (3 bytes per sample) file has no matching Numpy datatype.

If you want to do this yourself, you would need to do something like:

  1. Reshape the Numpy array of single bytes into 3-byte batches
  2. Add an extra column of zeros to pad to 32-bit.
  3. Reinterpret the batches as 32-bit integers.

This snippet shows how it could be done for big-endian data (pad at the start, and use > in the data type):

data = np.frombuffer(signal_wave, dtype=np.uint8)
samples = data.reshape((n_samples * n_channels, sample_width))
padding = np.zeros((n_samples * n_channels, 1), dtype=np.uint8)
padded_samples = np.hstack((padding, samples))
int_samples = padded_samples.view('>u4').flatten()
samples_l = int_samples[::2]

See this notebook for details. This still isn't a complete solution because it won't work for signed integers (I think PCM data is signed). Apparently there's a trick you can do with as_strided but it's dangerous (allows access to memory outside the array).

So if you can, use scipy.io.wavfile.read instead.

z0r
  • 8,185
  • 4
  • 64
  • 83
  • Alternatively, is there a way that I could change the sample width of the wav file to 2 before processing the data? – Jay Apr 11 '23 at 15:18
  • 1
    You could use the `batched` recipe from itertools perhaps, but you'd still need to deal with the sign bit. Look for a code snippet to parse a 24-bit int. – z0r Apr 12 '23 at 11:29