1

I am struggling doing a basic conditional elementwise multiplication between two dataframes. Let's assume I have the following two dataframes:

df1 = pd.DataFrame({'A': [-0.1,0.3,-0.4, 0.8,-0.5,-0.1,0.3,-0.4, 0.8,-1.2],'B': [-0.2,0.5,0.3,-0.5,0.1,-0.2,0.5,0.3,-0.5,0.9]},index=[0, 1, 2, 3,4,5,6,7,8,9])
df2=pd.DataFrame({'C': [-0.003,0.03848,-0.04404, 0.018,-0.1515,-0.02181,0.233,-0.0044, 0.01458,-0.015],'D': [-0.0152,0.0155,0.03,-0.0155,0.0151,-0.012,0.035,0.0013,-0.0005,0.009]},index=[0, 1, 2, 3,4,5,6,7,8,9])

The idea is to multiply df1 and df2.shift(-1) (elementwise, not matrix multiplication) depending on the values of df1. If (df1>=0.50 or df1<=-0.50) then I multiply df1 and df2.shift(-1). Otherwise, I just put 0.

The desired result in this example should be the following (with columns names being the column names of df1 as well as df1 index):

df3=pd.DataFrame({'A': [0,0,0, -0.1212,0.010905,0,0,0, -0.012,'NaN'],'B': [0,0.015,0,-0.00755,0,0,0.00065,0,-0.0045,'NaN']},index=[0, 1, 2, 3,4,5,6,7,8,9])

I tried the following code:

import numpy as np
import pandas as pd
df3=np.where((df1>=0.50 or df1 <=-0.50),df1*df2.shift(-1),0)

And I get The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Thank you.

CTXR
  • 139
  • 1
  • 9

1 Answers1

1

Use | for bitwise OR with DataFrame constructor:

arr = np.where((df1>=0.50) | (df1 <=-0.50),df1*df2.shift(-1),0)
df3 = pd.DataFrame(arr, index=df1.index, columns=df1.columns)
print (df3)
          A        B
0  0.000000  0.00000
1  0.000000  0.01500
2  0.000000  0.00000
3 -0.121200 -0.00755
4  0.010905  0.00000
5  0.000000  0.00000
6  0.000000  0.00065
7  0.000000  0.00000
8 -0.012000 -0.00450
9       NaN      NaN

Numpy solution should be faster:

arr2 = np.concatenate([df2.values[1:, ], 
                       np.repeat(np.nan, len(df2.columns))[None, :]])

arr = np.where((df1.values>=0.50) | (df1.values <=-0.50),df1.values*arr2,0)
df3 = pd.DataFrame(arr, index=df1.index, columns=df1.columns)
print (df3)
          A        B
0  0.000000  0.00000
1  0.000000  0.01500
2  0.000000  0.00000
3 -0.121200 -0.00755
4  0.010905  0.00000
5  0.000000  0.00000
6  0.000000  0.00065
7  0.000000  0.00000
8 -0.012000 -0.00450
9       NaN      NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • It works, thank you for the explanation. Last one: I tried it on the real dataframes and I got the error: operands could not be broadcast together with shapes (3951,1056) (7902,2112). I checked the dimension of the two dataframes before doing the computation and they are both (3951,1056). Not sure why it doubled the size of the second dataframe. Thanks – CTXR Nov 23 '18 at 12:24
  • 1
    Maybe problem different index or columns names, try change `df1*df2.shift(-1)` to `df1.values*df2.shift(-1).values` – jezrael Nov 23 '18 at 12:25
  • 1
    It is working now with this. Also tried your second solution with numpy, working as well. Thank you! – CTXR Nov 23 '18 at 12:36
  • First solution fails. `ValueError: operands could not be broadcast together with shapes (10,2) (10,4) ()` –  Nov 23 '18 at 15:35
  • 1
    Yes. This has been discussed in the comments, it works if you add `.values` – CTXR Nov 23 '18 at 16:44