10

Trying to compute SVD in Python to find the most significant elements of a spectrum and created a matrix just containing the most significant parts.

In python I have:

u,s,v = linalg.svd(Pxx, full_matrices=True)

This gives 3 matrices back; where "s" contains the magnitudes that corresponds to u, v.

In order to construct a new matrix, containing all of the significant parts of the signal, I need to capture the highest values in "s" and match them with the columns in "u" and "v" and the resulting matrix should give me the most significant part of the data.

The problem is I don't know how I would do this in Python, for example, how do I find the highest numbers in "s" and select the columns in "u" and "v" in order to create a new matrix?

(I'm new to Python and numpy) so any help would be greatly appreciated

Edit:

import wave, struct, numpy as np, matplotlib.mlab as mlab, pylab as pl
from scipy import linalg, mat, dot;
def wavToArr(wavefile):
    w = wave.open(wavefile,"rb")
    p = w.getparams()
    s = w.readframes(p[3])
    w.close()
    sd = np.fromstring(s, np.int16)
    return sd,p

def wavToSpec(wavefile,log=False,norm=False):
    wavArr,wavParams = wavToArr(wavefile)
    print wavParams
    return  mlab.specgram(wavArr, NFFT=256,Fs=wavParams[2],detrend=mlab.detrend_mean,window=mlab.window_hanning,noverlap=128,sides='onesided',scale_by_freq=True)

wavArr,wavParams = wavToArr("wavBat1.wav")

Pxx, freqs, bins = wavToSpec("wavBat1.wav")
Pxx += 0.0001

U, s, Vh = linalg.svd(Pxx, full_matrices=True)
assert np.allclose(Pxx, np.dot(U, np.dot(np.diag(s), Vh)))

s[2:] = 0
new_a = np.dot(U, np.dot(np.diag(s), Vh))
print(new_a)
Phorce
  • 4,424
  • 13
  • 57
  • 107
  • Change `full_matrices=True` to `full_matrices=False`. – unutbu Feb 02 '14 at 19:15
  • @unutbu - I've changed it. This segments. If I change `full_matrices=True` then I get the following error: ValueError: objects are not aligned.. Any ideas, sorry? – Phorce Feb 02 '14 at 19:17
  • When you say it "segments" do you mean there is a segmentation fault, and no error message? Or does it leave a stack trace and error message such as `wave.Error: unknown format: -2`? – unutbu Feb 02 '14 at 19:33
  • @unutbu Just get the following error: `Segmentation fault (core dumped)` The shape of Pxx: (129, 146) – Phorce Feb 02 '14 at 19:36
  • I've run your code (with `full_matrices=False`) on a number of .wavs and was unable to produce a Segmentation fault. Can you post PXX? Happily, it's not very large. – unutbu Feb 02 '14 at 19:43
  • @unutbu This is the raw data: http://pastebin.com/mBtasJLD – Phorce Feb 02 '14 at 19:49
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/46641/discussion-between-user1326876-and-unutbu) – Phorce Feb 02 '14 at 19:50

1 Answers1

13

linalg.svd returns s in descending order. So to select the n highest numbers in s, you'd simply form

s[:n]

If you set the smaller values of s to zero,

s[n:] = 0

then matrix multiplication would take care of "selecting" the appropriate columns of U and V.

For example,

import numpy as np
LA = np.linalg

a = np.array([[1, 3, 4], [5, 6, 9], [1, 2, 3], [7, 6, 8]])
print(a)
# [[1 3 4]
#  [5 6 9]
#  [1 2 3]
#  [7 6 8]]
U, s, Vh = LA.svd(a, full_matrices=False)
assert np.allclose(a, np.dot(U, np.dot(np.diag(s), Vh)))

s[2:] = 0
new_a = np.dot(U, np.dot(np.diag(s), Vh))
print(new_a)
# [[ 1.02206755  2.77276308  4.14651336]
#  [ 4.9803474   6.20236935  8.86952026]
#  [ 0.99786077  2.02202837  2.98579698]
#  [ 7.01104783  5.88623677  8.07335002]]

Given the data here,

import numpy as np
import scipy.linalg as SL
import matplotlib.pyplot as plt

Pxx = np.genfromtxt('mBtasJLD.txt')
U, s, Vh = SL.svd(Pxx, full_matrices=False)
assert np.allclose(Pxx, np.dot(U, np.dot(np.diag(s), Vh)))

s[2:] = 0
new_a = np.dot(U, np.dot(np.diag(s), Vh))
print(new_a)
plt.plot(new_a)
plt.show()

produces

enter image description here

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677