Item-wise weighted average of multiple DataFrames containing strings

Question

I am relatively new to Python and Pandas.

I have multiple DataFrames in the same format containing float and string values (in each column and row). The location of string/float items is the same across DataFrames. I would like to take a weighted average of the individual items across the several DataFrames with fixed weights.

Example:

df1 = pd.DataFrame([['A', '1', '2', 'B'], ['3', '4', 'C', '5'], ['6', 'D', 'E', '7']], columns=['Col1', 'Col2', 'Col3', 'Col4'])
df2 = pd.DataFrame([['A', '11', '12', 'B'], ['13', '14', 'C', '15'], ['16', 'D', 'E', '17']], columns=['Col1', 'Col2', 'Col3', 'Col4'])
df3 = pd.DataFrame([['A', '21', '22', 'B'], ['23', '24', 'C', '25'], ['26', 'D', 'E', '27']], columns=['Col1', 'Col2', 'Col3', 'Col4'])

df1
  Col1 Col2 Col3 Col4
0    A    1    2    B
1    3    4    C    5
2    6    D    E    7

df2
Col1 Col2 Col3 Col4
0    A   11   12    B
1   13   14    C   15
2   16    D    E   17

df3
  Col1 Col2 Col3 Col4
0    A   21   22    B
1   23   24    C   25
2   26    D    E   27

Using as an example weights of 0.2, 0.3 and 0.5 for the first item in each df (later, each item should receive a specific weight), I would like to achieve the following:

df4
  Col1 Col2 Col3 Col4
0    A   14   15    B
1   16   17    C   18
2   19    D    E   20

where the first item would be 0.2*1+0.3*11+0.5*21=14, the second item is 0.2*2+0.3*12+0.5*22=15, and so on.

The above resulting template is just for an easy understanding of the intended averaging across templates. In reality, I would like to apply different weights that sum to 1 to each item across the DataFrames. So the next item would not receive weights 0.2, 0.3 and 0.5 again, but different ones (e.g. 0.1, 0.8 and 0.1).

I am not sure how to accomplish this and was not able to find anything on the item-wise averaging with the added difficulty of string values)

The DataFrames are quite large, so an efficient way would be very much appreciated.

Many thanks!

EDIT:

To make it more clear: I would like to apply different weights to each item across the DataFrames. Therefore, I would need to average item-wise across the DataFrames. So, while the first item in each df would be averaged using the weights 0.2, 0.3 and 0.5, the next one would receive different weights (e.g. 0.1, 0.8 and 0.1). In the example above and using the weights 0.1, 0.8 and 0.1 as an example, the second item of the resulting df should then be 0.1*2+0.8*12+0.1*22=12. The next item would, again, receive different weights that sum to 1.

In the end, I would like to apply random weights (e.g. using random.uniform) that add to 1 to each item across the DataFrames.

score 2 · Answer 1 · answered Jul 05 '21 at 10:25

2

Try:

weights = [0.2, 0.3, 0.5]

df1x = df1.apply(lambda x: pd.to_numeric(x, errors="coerce"))
df2x = df2.apply(lambda x: pd.to_numeric(x, errors="coerce"))
df3x = df3.apply(lambda x: pd.to_numeric(x, errors="coerce"))

out = (df1x * weights[0] + df2x * weights[1] + df3x * weights[2]).fillna(df1)
print(out)

Prints:

   Col1  Col2  Col3  Col4
0     A  14.0  15.0     B
1  16.0  17.0     C  18.0
2  19.0     D     E  20.0

answered Jul 05 '21 at 10:25

Andrej Kesely

168,389
15
48
91

That worked great! I did not think at all about just removing the str values and then pasting them in afterwards. Could this be easily adjusted if I now wanted to give each "item" a specific weight? So not 0.2 for all items in df1, 0.3 for df2 and 0,5 for df3, but applying the weights item-wise? So e.g. for the first item have 0.2, 0.3 and 0.5 but for the next one have completely different weights (e.g. 0.1, 0.8, 0.1) and for the next one new weights again, and so on. Thanks a million! – Fhtsm Jul 05 '21 at 10:40
1

@Fhtsm I'd suggest to open a new question here on StackOverflow. I'll try to look at it. Also - for the new question you can omit the string values (you can remove them and paste them afterwards like in this example). It will be much easier to answer it for others. – Andrej Kesely Jul 05 '21 at 10:45
Many thanks! Would it work if I just edit the question here to make it more understandable? – Fhtsm Jul 05 '21 at 10:49
1

@Fhtsm You can try it. – Andrej Kesely Jul 05 '21 at 10:50

Item-wise weighted average of multiple DataFrames containing strings

1 Answers1