0

I did a PCA in Python on audio spectrograms and face the following problem: I have a matrix, where each row consists of flattened song features. After applying PCA it's clear to me, that the dimensions are reduced. BUT I can't find those dimensional data in the regular dataset.

import sys
import glob

from scipy.io.wavfile import read
from scipy import signal
from scipy.fftpack import fft
import numpy as np
import matplotlib.pyplot as plt
import pylab

# Read file to get samplerate and numpy array containing the signal 

files = glob.glob('../some/*.wav')

song_list = []

for wav in files:

    (fs, x) = read(wav)

    channels = [
        np.array(x[:, 0]),
        np.array(x[:, 1])
    ]

    # Combine channels to make a mono signal out of stereo
    channel =  np.mean(channels, axis=0)
    channel = channel[0:1024,]
    # Generate spectrogram 
    ## Freqs is the same with different songs, t differs slightly
    Pxx, freqs, t, plot = pylab.specgram(
        channel,
        NFFT=128, 
        Fs=44100, 
        detrend=pylab.detrend_none,
        window=pylab.window_hanning,
        noverlap=int(128 * 0.5))
    # Magnitude Spectrum to use
    Pxx = Pxx[0:2]
    X_flat = Pxx.flatten()
    song_list.append(X_flat)

song_matrix = np.vstack(song_list)

If I now apply PCA to the song_matrix...

import matplotlib
from matplotlib.mlab import PCA
from sklearn import decomposition


#test = matplotlib.mlab.PCA(song_matrix.T)

pca = decomposition.PCA(n_components=2)
song_matrix_pca = pca.fit_transform(song_matrix.T)


pca.components_ #These components should be most helpful to discriminate between the songs due to their high variance
pca.components_

...the final 2 components are the following: Final components - two dimensions from 15 wav-files The problem is, that I can't find those two vectors in the original dataset with all dimensions What am I doing wrong or am I misinterpreting the whole thing?

Jamona
  • 97
  • 1
  • 11

2 Answers2

1

PCA doesn't give you the vectors in your dataset. From Wikipedia : Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components.

user2867432
  • 382
  • 4
  • 14
  • You have two components. You have a 15x2 matrix which transforms your original 15 inputs into 2 outputs. You can see which ones contribute most heavily by examining the magnitudes of the vector coefficients. For instance, input #8 contributes a huge amount to final #1, 6 times as much as the second-place factor. final #2 is driven mostly by inputs 5, 11, and 8. Does that help clear things up? – Prune Oct 20 '15 at 00:01
  • That's what I don't understand. In the documentation it say that "components_ : array, [n_components,n_features] Components with maximum variance." (http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) This sounds like me that those components should also appear in the original data. I know that PCA in general doesn't has those values as its output. – Jamona Oct 20 '15 at 07:27
  • they're referring to the linearly uncorrelated variables called principal components – user2867432 Oct 21 '15 at 01:00
  • Does that make sense now ? – user2867432 Oct 28 '15 at 03:52
  • 1
    Would have been nice for you to follow up as simple courtesy since we took the time to answer your question – user2867432 Nov 03 '15 at 08:12
1

Say you have a column vector V containing ONE flattened spectrogram. PCA will find a matrix M whose columns are orthogonal vectors (think of them as being at right angles to every other column in M).

Multiplying M and T will give you a vector of "scores", which can be used to determine how much variance each column of M captures from the original data and each column of M captures progressively less variance in the data.

Multiplying matrix M' (the first 2 columns of M) by V will produce a 2x1 vector T' representing the "dimension-reduced spectrogram". You could reconstruct an approximation of V by multiplying T' by the inverse of M'. This would work if you had a matrix of spectrograms, too. Keeping only two principal components would produce an extremely lossy compression of your data.

But what if you want to add a new song to your dataset? Unless it is very much like the original song (meaning it introduces little variance to the original data set), there's no reason to think that the vectors of M will describe the new song well. For that matter, even multiplying all the elements of V by a constant would render M useless. PCA is quite data specific. Which is why it's not used in image/audio compression.

The good news? You can use a Discrete Cosine transform to compress your training data. Instead of lines, it finds cosines that form a descriptive basis, and doesn't suffer from the data specific limitation. DCT is used in jpeg, mp3 and other compression schemes.

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116