Find the eigenvalues of a subset of Dataframe in Python

Question

I have a matrix in the form of DataFrame

   df=     6M         1Y         2Y         4Y         5Y        10Y        30Y
      6M   n/a        n/a        n/a        n/a        n/a        n/a        n/a
      1Y   n/a          1  0.9465095   0.869504  0.8124711    0.64687  0.5089244
      2Y   n/a  0.9465095          1  0.9343177  0.8880676  0.7423546  0.6048189
      4Y   n/a   0.869504  0.9343177          1  0.9762842  0.8803984  0.7760753
      5Y   n/a  0.8124711  0.8880676  0.9762842          1  0.9117788  0.8404656
      10Y  n/a    0.64687  0.7423546  0.8803984  0.9117788          1  0.9514033
      30Y  n/a  0.5089244  0.6048189  0.7760753  0.8404656  0.9514033          1

I read the values from a matrix (real numbers) and whenever there is no data I insert 'n/a'(need to maintain this format for other reasons). I would like to compute the eigenvalues of subset of DataFrame that contains float values (essentially subset from '1Y' to '30Y').

I can extract the subset using iloc

tmp = df.iloc[1:df.shapep[0],1:df.shape[1]]

and this extract the correct values (check the types and they are float). But when I try to compute the eigenvalues of tmp using np.linalg.eigvalsh I get an error

TypeError: No loop matching the specified signature and casting
was found for ufunc eigvalsh_lo

The strange thing is that when I start from a dataframe where 'n/a' are replaces by '0.0' the the whole process can be done with no problem (it needs to be initialized by 0.0 and not for instance 0). It seems that if some part of the dataframe is not real the subset extraction does not turn the values in real numbers.

Is there a way to overcome this problem?

Anton Protopopov · Accepted Answer · 2016-01-15T10:52:40.257

IIUC you could convert your columns to numeric with pd.to_numericand replace non-numeric with NaN then using fillna() you could fill them with 0 and use np.linalg.eigvals:

In [348]: df.apply(pd.to_numeric, errors='coerce')
Out[348]:
     6M        1Y        2Y        4Y        5Y       10Y       30Y
6M  NaN       NaN       NaN       NaN       NaN       NaN       NaN
1Y  NaN  1.000000  0.946509  0.869504  0.812471  0.646870  0.508924
2Y  NaN  0.946509  1.000000  0.934318  0.888068  0.742355  0.604819
4Y  NaN  0.869504  0.934318  1.000000  0.976284  0.880398  0.776075
5Y  NaN  0.812471  0.888068  0.976284  1.000000  0.911779  0.840466
10Y NaN  0.646870  0.742355  0.880398  0.911779  1.000000  0.951403
30Y NaN  0.508924  0.604819  0.776075  0.840466  0.951403  1.000000

In [350]: df.apply(pd.to_numeric, errors='coerce').fillna(0)
Out[350]:
     6M        1Y        2Y        4Y        5Y       10Y       30Y
6M    0  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
1Y    0  1.000000  0.946509  0.869504  0.812471  0.646870  0.508924
2Y    0  0.946509  1.000000  0.934318  0.888068  0.742355  0.604819
4Y    0  0.869504  0.934318  1.000000  0.976284  0.880398  0.776075
5Y    0  0.812471  0.888068  0.976284  1.000000  0.911779  0.840466
10Y   0  0.646870  0.742355  0.880398  0.911779  1.000000  0.951403
30Y   0  0.508924  0.604819  0.776075  0.840466  0.951403  1.000000

In [351]: np.linalg.eigvals(df.apply(pd.to_numeric, errors='coerce').fillna(0))
Out[351]:
array([ 5.11329285,  0.7269089 ,  0.07770957,  0.01334893,  0.02909796,
        0.03964179,  0.        ])

After applying pd.to_numeric all values becoming float:

In [352]: df.apply(pd.to_numeric, errors='coerce').dtypes
Out[352]:
6M     float64
1Y     float64
2Y     float64
4Y     float64
5Y     float64
10Y    float64
30Y    float64
dtype: object

Note pd.to_numeric works only with pandas version >= 0.17.0.

If you have only 'n/a' values you could use replace and astype(float):

df.replace('n/a', 0).astype(float)

In [364]: df.replace('n/a', 0).astype(float)
Out[364]:
     6M        1Y        2Y        4Y        5Y       10Y       30Y
6M    0  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
1Y    0  1.000000  0.946510  0.869504  0.812471  0.646870  0.508924
2Y    0  0.946510  1.000000  0.934318  0.888068  0.742355  0.604819
4Y    0  0.869504  0.934318  1.000000  0.976284  0.880398  0.776075
5Y    0  0.812471  0.888068  0.976284  1.000000  0.911779  0.840466
10Y   0  0.646870  0.742355  0.880398  0.911779  1.000000  0.951403
30Y   0  0.508924  0.604819  0.776075  0.840466  0.951403  1.000000

In [365]: np.linalg.eigvals(df.replace('n/a', 0).astype(float))
Out[365]:
array([ 5.11329285,  0.7269089 ,  0.07770957,  0.01334893,  0.02909796,
        0.03964179,  0.        ])

Thanks Anton for your comment. I tried your solution but `to_numeric` does not seem to be working. I get the following error `AttributeError: 'module' object has no attribute 'to_numeric'` — Hamed, Jan 15 '16 at 10:45
Are you have only `n/a` values except for numeric values? Then you could do `replace` — Anton Protopopov, Jan 15 '16 at 10:50
hmmm, I have `0.16.2` for Pandas. Yes, I have only `n/a` except for numeric values. — Hamed, Jan 15 '16 at 12:38

Find the eigenvalues of a subset of Dataframe in Python

1 Answers1