This is the code I wrote in python that extracts data from a .wav file, applies pre-emphasis, divide into frames of 0.025ms with 0.010 stride, and applies a hamming window:
import scipy.io.wavfile as wavfile
import numpy as np
samplerate, data = wavfile.read(filename)
window = np.hamming(int(winlen*samplerate))
# Pre-Emphasis
for i in range(1, len(data)):
data[i] = data[i] - 0.97 * data[i-1]
# Framing
dlen = int((len(data)/samplerate - winlen)/stride)+1
for i in range(dlen):
stpt = int(samplerate*stride)*i
datalen = int(winlen*samplerate)
framedata = data[stpt:stpt+datalen].copy()
frames.append(framedata)
# Apply Window
for i in range(len(frames)):
for j in range(int(winlen*samplerate)):
frames[i][j] *= window[j]
Pretty standard, nothing wrong here imo.
Here is the code I wrote in C to do the exact same operation.
#include <iostream>
#include <cmath>
double *hammingwindow;
hammingwindow = (double*)calloc(framelen, sizeof(double));
short *data = new short[10000000];
double **data_frames;
double **data_frames = Declare2DArray(parameters.numframes, parameters.framelen);
// declare hamming window
for (int i=0; i<framelen; i++) {
hammingwindow[i] = double(0.54) - double(0.46) * cos(2 * PI * double(i) / double(framelen - 1));
}
// apply pre-emphasis (both C and python set to 0.97)
for (int i=1; i<parameters.datasize; i++) {
data[i] = data[i] - parameters.preemphasis * data[i-1];
}
// Divide into frames
int startpoint = parameters.stride * parameters.samplefreq;
for (int i=0; i<parameters.numframes; i++) {
for (int j=0; j<parameters.framelen; j++) {
data_frames[i][j] = data[startpoint*i + j];
}
}
// Multiply hamming window to frames
for (int i=0; i<parameters.numframes; i++) {
for (int j=0; j<parameters.framelen; j++) {
data_frames[i][j] *= parameters.hammingwindow[j];
}
}
(Conditions)
- sample rate: 16000
- sample .wav file is 1 second long
- The results show no difference in the data read and data that is divided into frames (checked separately)
- Data from C is passed to python through a binary file, which is saved in C and read in python
(Problem) The python window function and the C window function shows a very marginal difference, fluctuating between +4e-7 ~ -4e07.
However, the windowed data shows a rather consistent abs(difference) of 0.99~1.0 at around 180th~220th frames, out of 400 frames. That's around the dead center of the frames. I don't get how this happens, because the magnitude values of the frames differ from the single digits to the hundreds. The magnitude difference fluctuates a lot, but the difference in the central values AFTER the window function has been applied is CONSISTENT. How??
Could someone give me an explanation of how this might happen, or any idea they might have?