I have two dataframes: the both have 5 columns, but the first one has 100 rows, and the second one just one row. I should multiply every row of the first dataframe by this single row of the second, and than summarize the value of columns in each row and this value in the 6th new column 'sum of multipliations". I've seen "np.dot" operation, but I'm not sure that I could apply it to dataframes. Also I'm looking for the pythonic/pandas operation or method, if it's possible to replace a little bit heavy numpy code from scratch? Thank you in advance for your advice.
Asked
Active
Viewed 61 times
2 Answers
1
I think you can convert DataFrames
to numpy arrays
by values
, multiple them and last sum
:
import pandas as pd
import numpy as np
np.random.seed(1)
df1 = pd.DataFrame(np.random.randint(10, size=(1,5)))
df1.columns = list('ABCDE')
print df1
A B C D E
0 5 8 9 5 0
np.random.seed(0)
df2 = pd.DataFrame(np.random.randint(10,size=(10,5)))
df2.columns = list('ABCDE')
print df2
A B C D E
0 5 0 3 3 7
1 9 3 5 2 4
2 7 6 8 8 1
3 6 7 7 8 1
4 5 9 8 9 4
5 3 0 3 5 0
6 2 3 8 1 3
7 3 3 7 0 1
8 9 9 0 4 7
9 3 2 7 2 0
print df2.values * df1.values
[[25 0 27 15 0]
[45 24 45 10 0]
[35 48 72 40 0]
[30 56 63 40 0]
[25 72 72 45 0]
[15 0 27 25 0]
[10 24 72 5 0]
[15 24 63 0 0]
[45 72 0 20 0]
[15 16 63 10 0]]
df = pd.DataFrame(df2.values * df1.values)
df['sum'] = df.sum(axis=1)
print df
0 1 2 3 4 sum
0 25 0 27 15 0 67
1 45 24 45 10 0 124
2 35 48 72 40 0 195
3 30 56 63 40 0 189
4 25 72 72 45 0 214
5 15 0 27 25 0 67
6 10 24 72 5 0 111
7 15 24 63 0 0 102
8 45 72 0 20 0 137
9 15 16 63 10 0 104
Timing:
In [1185]: %timeit df2.mul(df1.ix[0], axis=1)
The slowest run took 5.07 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 287 µs per loop
In [1186]: %timeit pd.DataFrame(df2.values * df1.values)
The slowest run took 6.31 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 98 µs per loop

jezrael
- 822,522
- 95
- 1,334
- 1,252
0
You are probably looking for something like this:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({ 'A' : [1.1,2.7, 3.4],
'B' : [-1.,-2.5, -3.9]})
df1['sum of multipliations']=df1.sum(axis = 1)
df2 = pd.DataFrame({ 'A' : [2.],
'B' : [3.],
'sum of multipliations' : [1.]})
print df1
print df2
row = df2.ix[0]
df5=df1.mul(row, axis=1)
df5.loc['Total']= df5.sum()
print df5

tfv
- 6,016
- 4
- 36
- 67