-1

I am detecting Multicollinearity using eigen values and vector for longley data. When I compute eigen values from SPSS I found different eigen values than R language. I don't why. I computed for both Standardized X matrix and actual X matrix but results mismatch.

data(longley)
x<-as.matrix(longley[,-7])
e<-eigen(t(x)%*%x)

The following is the result from R Language

$values
[1] 6.665299e+07 2.090730e+05 1.053550e+05 1.803976e+04 2.455730e+01
[6] 2.015117e+00

Following is the result from SPSS

6.861392768154346
0.08210250361264278
0.04568078445788493
0.01068846567618869
1.29228130384155E-4
6.2463047077443345E-6
3.663846498908749E-9

What is the possible command error? Also guide me how to compute proportional explained variation.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
itfeature.com
  • 380
  • 4
  • 16
  • 3
    Why do R and SPSS give a different number of eigenvalues? – Warren Weckesser Jun 14 '13 at 04:55
  • This the question why R and SPSS giving different results for same data. – itfeature.com Jun 14 '13 at 05:03
  • 1
    Why don't you show exactly what you did in SPSS, like you did for R. That might help. – Hong Ooi Jun 14 '13 at 07:20
  • I don't know why for low reputation every other try to mark the question as not useful or vote down. This is the critical different between two software, that should be discussed and solve. – itfeature.com Jun 14 '13 at 07:23
  • I guess the down vote comes, because you do not mention your calculation steps in SPSS. How can one evaluate that you did the right things there. Also a reference to the different literature result could be helpful to solve the question, if there is a critical difference between the software or between the usage. – Daniel Fischer Jun 14 '13 at 08:08
  • SPSS is menu driven software, so there is no need to mention the calculations as SPSS does itself from menus. Down vote usually comes from high reputations to show their worth on stackoverflow. Many of the time it found true. – itfeature.com Jun 14 '13 at 09:17
  • I am not too familiar with SPSS, but isn't there such thing like a syntax editor? Maybe you could check the SPSS syntax for calculating eigenvalues and compare those then?! Because, if you extract the eigenvalues from some PCA or so, have you checked that SPSS doesn't calculate them by using a standardized version of the matrix or applies some other 'pre-steps'? – Daniel Fischer Jun 14 '13 at 09:35
  • 1
    SPSS has syntax, just paste the syntax from any command. The note about different variables is pertinent, your R code only returns 6 eigenvalues, while SPSS returns 7. Either you used different variables (which when not including the SPSS code is impossible for anyone to know), or the matrix is ill-conditioned (in which differences between programs would not be unexpected). – Andy W Jun 14 '13 at 12:22
  • What did you do in SPSS to get those numbers? If it is menu-driven, can you point to any online documentation of the actions that you took, and what those actions are supposed to compute? – Warren Weckesser Jun 14 '13 at 14:16
  • I did not used the different variables. Problem is that the R language tutorial for collinearity or ill conditioning gives the same command as I posted, but results varies from not only already published literature but also from other software. Following the SPSS syntax DATASET ACTIVATE DataSet1. REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA COLLIN TOL /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT Y /METHOD=ENTER X2 X3 X4 X5 X6 X1. – itfeature.com Jun 14 '13 at 17:03

2 Answers2

1

This "answer" is really just a long comment.

Here's longley[,-7].

> longley[,-7]
     GNP.deflator     GNP Unemployed Armed.Forces Population Year
1947         83.0 234.289      235.6        159.0    107.608 1947
1948         88.5 259.426      232.5        145.6    108.632 1948
1949         88.2 258.054      368.2        161.6    109.773 1949
1950         89.5 284.599      335.1        165.0    110.929 1950
1951         96.2 328.975      209.9        309.9    112.075 1951
1952         98.1 346.999      193.2        359.4    113.270 1952
1953         99.0 365.385      187.0        354.7    115.094 1953
1954        100.0 363.112      357.8        335.0    116.219 1954
1955        101.2 397.469      290.4        304.8    117.388 1955
1956        104.6 419.180      282.2        285.7    118.734 1956
1957        108.4 442.769      293.6        279.8    120.445 1957
1958        110.8 444.546      468.1        263.7    121.950 1958
1959        112.6 482.704      381.3        255.2    123.366 1959
1960        114.2 502.601      393.1        251.4    125.368 1960
1961        115.7 518.173      480.6        257.2    127.852 1961
1962        116.9 554.894      400.7        282.7    130.081 1962

This shows seven columns, but the last column just copies the index that is in the first column. I suspect that in SPSS, you have processed all 7 columns, while in R you processed 6 columns.

This is just a guess--I don't have SPSS, so I can't even try to reproduce your result.

The calculation that you've done in R just computes the eigenvalues of xT * x, and those values are correct. Here's the same calculation in Python, using numpy:

In [5]: x
Out[5]: 
array([[   83.   ,   234.289,   235.6  ,   159.   ,   107.608,  1947.   ],
       [   88.5  ,   259.426,   232.5  ,   145.6  ,   108.632,  1948.   ],
       [   88.2  ,   258.054,   368.2  ,   161.6  ,   109.773,  1949.   ],
       [   89.5  ,   284.599,   335.1  ,   165.   ,   110.929,  1950.   ],
       [   96.2  ,   328.975,   209.9  ,   309.9  ,   112.075,  1951.   ],
       [   98.1  ,   346.999,   193.2  ,   359.4  ,   113.27 ,  1952.   ],
       [   99.   ,   365.385,   187.   ,   354.7  ,   115.094,  1953.   ],
       [  100.   ,   363.112,   357.8  ,   335.   ,   116.219,  1954.   ],
       [  101.2  ,   397.469,   290.4  ,   304.8  ,   117.388,  1955.   ],
       [  104.6  ,   419.18 ,   282.2  ,   285.7  ,   118.734,  1956.   ],
       [  108.4  ,   442.769,   293.6  ,   279.8  ,   120.445,  1957.   ],
       [  110.8  ,   444.546,   468.1  ,   263.7  ,   121.95 ,  1958.   ],
       [  112.6  ,   482.704,   381.3  ,   255.2  ,   123.366,  1959.   ],
       [  114.2  ,   502.601,   393.1  ,   251.4  ,   125.368,  1960.   ],
       [  115.7  ,   518.173,   480.6  ,   257.2  ,   127.852,  1961.   ],
       [  116.9  ,   554.894,   400.7  ,   282.7  ,   130.081,  1962.   ]])

In [6]: eigvals(x.T.dot(x))
Out[6]: 
array([  6.66529929e+07,   2.09072969e+05,   1.05355048e+05,
         1.80397602e+04,   2.45572970e+01,   2.01511742e+00])
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • Not sure what you mean: `longley[ , -7]` has six columns. When you print it, you see an extra column, which are the rownames. – zelite Jun 14 '13 at 07:13
  • IN SPSS collinearity diagnostic is computed. Eigen values involve here X matrix not Y and X. On the other hand results also mismatch from literature result already available. on the other hand using all the 7 variable also produce different results as given below 6.672135e+07 2.091251e+05 1.053712e+05 1.805698e+04 2.465533e+01 3.196123e+00 1.414497e+00 – itfeature.com Jun 14 '13 at 07:13
  • @zelite: I mean the text display shows seven columns of numbers, that's all. Yes, the first column is just the row names. – Warren Weckesser Jun 14 '13 at 14:13
  • Following is the result from Faraway-PRA ebook data(longley) x<-as.matrix(longley[,-7]) >e<-eigen(t(x)%*%x) e$values 6.6653e+07 2.0907e+05 1.0536e+05 1.8040e+04 2.4557e+01 2.0151e+00 – itfeature.com Jun 15 '13 at 06:27
1

For collinearity diagnostic by eigenvalues one should rescale the X matrix including intercept as "obtained by dividing each original value by the square root of the sum of squared original values for that column in the original matrix, including those for the intercept" After that have to compute the eigenvalues.

Its R code is

data (longley) 
X<-as.matrix(cbind(1,longley[,-7])) 
X <- apply(X, 2 , function(x) x/sqrt(sum(x^2))) 
eigen(t(X)%*%X) 

The obtained values are now not only matches the literature but also other software.

itfeature.com
  • 380
  • 4
  • 16
  • this is a reasonable answer, but I can't help thinking your original answer was pretty unclear -- you didn't actually tell us that SPSS was scaling the X matrix, just that the eigenvalues were different (and you presented 7 rather than 6 eigenvalues). That makes this more a "reading the documentation" question than a programming question ... – Ben Bolker Jun 15 '13 at 15:02
  • @Ben Bolker: Actually the problem is that in R tutorials the collinearity diagnostic is described by the following command eigen(t(X)%*%X). In in ebook "Practical Regression and using R" by Faraway is discussed as I performed first. You can see the page number 110 of this book. There are 6 eigenvalues reported all of which are wrong as compared to literature of collinearity diagnostics techniques. Thanks for patients. – itfeature.com Jun 15 '13 at 15:43