9

I have a dataframe of shape (4, 3) as following:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: x = pd.DataFrame(np.random.randn(4, 3), index=np.arange(4))

In [4]: x
Out[4]: 
          0         1         2
0  0.959322  0.099360  1.116337
1 -0.211405 -2.563658 -0.561851
2  0.616312 -1.643927 -0.483673
3  0.235971  0.023823  1.146727

I want to multiply each column of the dataframe with a numpy array of shape (4,):

In [9]: y = np.random.randn(4)

In [10]: y
Out[10]: array([-0.34125522,  1.21567883, -0.12909408,  0.64727577])

In numpy, the following broadcasting trick works:

In [12]: x.values * y[:, None]
Out[12]: 
array([[-0.32737369, -0.03390716, -0.38095588],
       [-0.25700028, -3.11658448, -0.68303043],
       [-0.07956223,  0.21222123,  0.06243928],
       [ 0.15273815,  0.01541983,  0.74224861]])

However, it doesn't work in the case of pandas dataframe, I get the following error:

In [13]: x * y[:, None]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-21d033742c49> in <module>()
----> 1 x * y[:, None]
...
ValueError: Shape of passed values is (1, 4), indices imply (3, 4)

Any suggestions?

Muhammad Dyas Yaskur
  • 6,914
  • 10
  • 48
  • 73
Wei Li
  • 597
  • 3
  • 5
  • 13
  • Your code works fine on my end. Perhaps a version difference? `pd.__version__: '0.16.1' np.__version__: '1.9.2'` – EelkeSpaak Aug 12 '15 at 16:18
  • This works for me if I write `x.values * y[:, None]` instead of `x * y[:, None]` which is what you have in your line `In [13]`. – xnx Aug 12 '15 at 16:19
  • I am using the following versions: pd.__version__: '0.16.2', np.__version__: '1.9.2'. – Wei Li Aug 12 '15 at 17:03
  • I just tried this on a machine with pandas version: 0.15.0 and numpy version version of 1.8.0. This operation (x * y[:, None]) still doesn't work. I suspect this is a issue caused by version of pandas or numpy. – Wei Li Aug 12 '15 at 17:07

2 Answers2

15

I find an alternative way to do the multiplication between pandas dataframe and numpy array.

In [14]: x.multiply(y, axis=0)
Out[14]: 
          0         1         2
0  0.195346  0.443061  1.219465
1  0.194664  0.242829  0.180010
2  0.803349  0.091412  0.098843
3  0.365711 -0.388115  0.018941
Wei Li
  • 597
  • 3
  • 5
  • 13
5

I think you are better off using the df.apply() method. In your case:

x.apply(lambda x: x * y)
dagrha
  • 2,449
  • 1
  • 20
  • 21
  • Thanks, I tried this: x.apply(lambda x: x * y), and it works for me. – Wei Li Aug 12 '15 at 17:09
  • Yes that is exactly right Wei Li. Sorry-- I used the conventional 'df' in my original answer because I thought using 'x' could lead to confusion between variables in the internal and external scopes. – dagrha Aug 12 '15 at 17:26