How to populate a square, symmetric DataFrame with elements from a Series based on pair names?

Question

In the following empty DataFrame, which is a square, symmetric matrix,

df = pd.DataFrame(np.zeros((6,6)), index=names, columns=names)

df looks like

    MCD   VZ  JPM   PG  WBA  NKE
MCD  0.0  0.0  0.0  0.0  0.0  0.0
VZ   0.0  0.0  0.0  0.0  0.0  0.0
JPM  0.0  0.0  0.0  0.0  0.0  0.0
PG   0.0  0.0  0.0  0.0  0.0  0.0
WBA  0.0  0.0  0.0  0.0  0.0  0.0
NKE  0.0  0.0  0.0  0.0  0.0  0.0

how to receive the corresponding elements from the last column of a different DataFrame, shown below, df2.mi, according to pair-names like ('MCD','VZ'), as well as place the same values for (i,j) in those for (j,i) (because of matrix symmetry)? For example, 0.263357 should appear in df for the two places ('MCD','VZ') and ('VZ','MCD'). A toy answer with only 3 names would be fine.

Alternatively, is there a common algorithm for populating a square, symmetric matrix by for loop, according to the (i,j) rules described?

perl · Accepted Answer · 2021-01-11T15:29:06.963

1

You can pivot and add values + transposed values (.T) to df.

Assuming df2 has the structure of the DataFrame posted as a picture (values below are random):

z = df2.pivot('asset1', 'asset2', 'mi')
df.add(z, fill_value=0).add(z.T, fill_value=0)

Output:

          JPM       MCD       NKE        PG        VZ       WBA
JPM  0.000000  0.532617  0.322079  0.377284  0.242471  0.020071
MCD  0.532617  0.000000  0.504874  1.494942  0.737827  0.036683
NKE  0.322079  0.504874  0.000000  0.726165  0.255161  0.645917
PG   0.377284  1.494942  0.726165  0.000000  0.654995  0.593592
VZ   0.242471  0.737827  0.255161  0.654995  0.000000  0.107172
WBA  0.020071  0.036683  0.645917  0.593592  0.107172  0.000000

edited Jan 11 '21 at 15:29

answered Jan 11 '21 at 15:17

perl

9,826
1
10
22

great answer. the final result is exactly what I want. looking at `z` by itself though, it's all jumbled up unexpectedly and not sure what I'm looking at, or how its transpose, also randomly jumbled up, helps create the final result – develarist Jan 12 '21 at 06:10
If `df2` was changed from what is shown in the second image to having two indexes (which were the first two columns): `df2.set_index(['asset1', 'asset2'], inplace=True)`, how can I make `z = df2.pivot('asset1', 'asset2', 'mi')` work again? – develarist Jan 12 '21 at 14:44
`z` is basically just a pivoted version of `df2` (same thing you would get with pivot table in Excel), and when we `add` it to `df` the values are aligned by index and column, so we get the same order of indexes and columns as in `df` – perl Jan 12 '21 at 14:48
Regarding your question with `set_index`, I think the most straightforward way would be to `reset_index` before the `pivot`, i.e. `df2.reset_index().pivot('asset1', 'asset2', 'mi')` – perl Jan 12 '21 at 14:53
Right now, that's what I'm doing due to the two indexes placed in `df2`, however, the line `reset_index` has to be repeated twice because there are two indexes. any way to reset two indexes at the same time? – develarist Jan 12 '21 at 14:54
Or, as an alternative, if 'asset1' and 'asset2' are now in the index, we can `unstack` 'asset2' to get the same thing: `df2['mi'].unstack('asset2')` – perl Jan 12 '21 at 14:54
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/227206/discussion-between-perl-and-develarist). – perl Jan 12 '21 at 14:55

How to populate a square, symmetric DataFrame with elements from a Series based on pair names?

1 Answers1