1

I have a pandas dataframe which looks vaguely like this:

Out[130]: 
     xvar            yvar                   meanRsquared
0    filled_water    precip                 0.119730
1    filled_water    snow                   0.113214
2    filled_water    filled_wetland         0.119529
3    filled_wetland  precip                 0.104826
4    filled_wetland  snow                   0.121540
5    filled_wetland  filled_water           0.121540
[676 rows x 3 columns]

I would like to transform it's shape into a more traditional correlation matrix, where the columns and the index are the variables, and the values are the meanRsquared.

Is there any easy way to do this? I've been playing around for an hour and can't figure out how I could do this.

DISCLAIMER: Yes, I know pandas has a built in function for creating a correlation matrix. However my current df is the average of hundreds of correlation matrices over many watersheds, so I cannot use that.

This is my best attempt, but obviously the logic failed towards the end.

listOfdicts = []
for xvar in df['xvar'].unique():
    for yvar in df['yvar'].unique():
        adict = {}
        adict['index'] = xvar 
        adict[yvar] = yvar
        adict['r'] = df['insert r value here']
        listOfdicts.append(adict)
answer = pd.Dataframe.from_dict(listOfdicts)

I don't expect this to work, but this was my best shot.

yeet_man
  • 48
  • 3
  • But what are those `meanRsquared`? Cofficient of correlation matrix? So what you want is to number the possible values values of `xvar` (and `yvar`, they are the same, I assume), and then creates a matrix `M`, such as `M[idx(xvar), idx(yvar)] = df[(df.xvar==xvar) & (df.yvar==yvar)]['meanRsquared']` ? – chrslg Jul 06 '23 at 23:06
  • In other words, you just want to rearrange the values in a matrix. Is that so? – chrslg Jul 06 '23 at 23:06

1 Answers1

1

You need to look at pivot method (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html).

import pandas as pd
df =pd.DataFrame(
    data={
        'xvar': ['filled_water', 'filled_water', 'filled_water',
                 'filled_wetland', 'filled_wetland', 'filled_wetland'],   

        'yvar':['precip','snow','filled_wetland',                  
                'precip','snow','filled_water' ], 
        'meanRsquared':[1,2,3,4,5,6] 
    }, index=range(6)
)

df.pivot(index='xvar', columns='yvar', values='meanRsquared')

Output:

    yvar            filled_water  filled_wetland  precip  snow
xvar                                                      
filled_water             NaN             3.0     1.0   2.0
filled_wetland           6.0             NaN     4.0   5.0
Mikhail Genkin
  • 3,247
  • 4
  • 27
  • 47