4
I have 2 dataframes:

>>> type(c)
Out[118]: pandas.core.frame.DataFrame
>>> type(N)
Out[119]: pandas.core.frame.DataFrame

>>> c
Out[114]: 
                       t
2017-06-01 01:06:00 1.00
2017-06-01 01:13:00 1.00
2017-06-01 02:09:00 1.00
2017-06-26 22:47:00 1.00

>>> N
Out[115]: 
                       0    1
2017-06-01 01:06:00 1.00 1.00
2017-06-01 01:13:00 1.00 1.00
2017-06-01 02:09:00 1.00 1.00
2017-06-26 22:47:00 1.00 1.00

I need to multiply these together to get a 4,2 dataframe that is multiplication of each column of N elementwise with the C. I tried the following 4 approaches with no luck:

>>> N.multiply(c, axis='index')
Out[116]: 
                      0   1   t
2017-06-01 01:06:00 nan nan nan
2017-06-01 01:13:00 nan nan nan
2017-06-01 02:09:00 nan nan nan
2017-06-26 22:47:00 nan nan nan

>>> c[:]*N
Out[98]: 
                      0   1   t
2017-06-01 01:06:00 nan nan nan
2017-06-01 01:13:00 nan nan nan
2017-06-01 02:09:00 nan nan nan
2017-06-26 22:47:00 nan nan nan

>>> c*N
Out[99]: 
                      0   1   t
2017-06-01 01:06:00 nan nan nan
2017-06-01 01:13:00 nan nan nan
2017-06-01 02:09:00 nan nan nan
2017-06-26 22:47:00 nan nan nan

>>> c[:, None]*N
Traceback (most recent call last):

  File "C:\...pandas\core\frame.py", line 1797, in __getitem__
    return self._getitem_column(key)
  File "C:\...core\frame.py", line 1804, in _getitem_column
    return self._get_item_cache(key)
  File "C:\...core\generic.py", line 1082, in _get_item_cache
    res = cache.get(item)
TypeError: unhashable type

Is there a way, with or without broadcasting to do this easily?

dayum
  • 1,073
  • 15
  • 31
  • 1
    Note: `c[:, None]` this notation to add new axis is for numpy arrays - it won't work with DataFrames. If you want to add a new axis first you need to convert it to a numpy array with `c.values[:, None]` – ayhan Aug 06 '17 at 22:17

1 Answers1

5

The problem is that you pass a DataFrame so it tries to match the column names too. If you slice the column t, it will become a Series and it will broadcast appropriately:

N.mul(c['t'], axis=0)
Out: 
                       0    1
2017-06-01 01:06:00  1.0  1.0
2017-06-01 01:13:00  1.0  1.0
2017-06-01 02:09:00  1.0  1.0
2017-06-26 22:47:00  1.0  1.0

In the case of numpy arrays, you don't need to specify anything. With shapes of (4, 2) and (4, 1) numpy will see the axis with the same length and broadcast accordingly.

Consider the following DataFrames:

N
Out: 
                       0    1
2017-06-01 01:06:00  1.0  2.0
2017-06-01 01:13:00  6.0  5.0
2017-06-01 02:09:00  4.0  3.0
2017-06-26 22:47:00  4.0  7.0


c
Out: 
                       t
2017-06-01 01:06:00  6.0
2017-06-01 01:13:00  2.0
2017-06-01 02:09:00  8.0
2017-06-26 22:47:00  2.0

You can access the underlying array with the .values attribute so

N.values * c.values
Out: 
array([[  6.,  12.],
       [ 12.,  10.],
       [ 32.,  24.],
       [  8.,  14.]])

will give you the same result as

N.mul(c['t'], axis=0)
Out: 
                        0     1
2017-06-01 01:06:00   6.0  12.0
2017-06-01 01:13:00  12.0  10.0
2017-06-01 02:09:00  32.0  24.0
2017-06-26 22:47:00   8.0  14.0

But since the whole operation is in numpy, you will lose the labels.

ayhan
  • 70,170
  • 20
  • 182
  • 203
  • Thanks, this is helpful. Just to understand the concept fully, what do you think would be the right way if 'N' was an ndarray (4X2) and 'c' a dataframe? Would I need to convert N to a dataframe first? I tried doing N*c[:] and N*c['t'] in that case but didn't work. – dayum Aug 06 '17 at 23:09
  • @ayhan Hi, when I try `N * c['t']` it doesn't work, do you know why? – malioboro Mar 21 '19 at 10:09