Currently I am working on a project that requires me to pick out audio clips and compare them based off their FFT results (i.e. spectrogram). All of my audio clips are 0.200s long, but when I process them through the transform, they are no longer the same length. The code I am using for the transform uses numpy and librosa libraries:
def extractFFT(audioArr):
fourierArr = []
fourierComplex = []
for x in range(len(audioArr)):
y, sr = lb.load(audioArr[x])
fourier = np.fft.fft(y)
fourier = fourier.real
fourierArr.append(fourier)
return fourierArr
I am only taking the real number portion of the transform because I also wanted to pass this through a PCA, which does not allow for complex numbers. Regardless, I can perform neither LDA (linear discriminant analysis) or PCA on this FFT array of audio clips, since some are of different lengths.
The code I have for the LDA is as follows, where the labels are given for a frequencyArr
of length 4:
def LDA(frequencyArr):
splitMark = int(len(frequencyArr)*0.8)
trainingData = frequencyArr[:splitMark]
validationData = frequencyArr[splitMark:]
labels = [1,1,2,2]
lda = LinearDiscriminantAnalysis()
lda.fit(trainingData,labels[:splitMark])
print(f"prediction: {lda.predict(validationData)}")
This throws the following value error, coming from the lda.fit(trainingData,labels[:splitMark])
line:
ValueError: setting an array element with a sequence.
I know this error stems from the array not being of a set 2 dimensional shape, since I don't receive this error when the FFT elements are all of equal length and the code works as intended.
Does this have something to do with the audio clips? After the transform, some audio clips are of equal lengths, others are not. If someone could explain why these same length audio clips can return different length FFT's, that would be great!
Note, they normally only differ by a few points, say for 3 of the audio clips the FFT length is 4410 but for the 4th it is 4409. I know I can probably just trim the lengths down to the smallest length out of the group, but I'd prefer a cleaner method that won't leave out any values.