2

I know this must be easy but I can't figure it out or find an existing answer on this...

Say I have this dataframe...

>>> import pandas as pd
>>> import numpy as np
>>> dates = pd.date_range('20130101', periods=6)
>>> df = pd.DataFrame(np.nan, index=dates, columns=list('ABCD'))
>>> df
             A   B   C   D
2013-01-01 NaN NaN NaN NaN
2013-01-02 NaN NaN NaN NaN
2013-01-03 NaN NaN NaN NaN
2013-01-04 NaN NaN NaN NaN
2013-01-05 NaN NaN NaN NaN
2013-01-06 NaN NaN NaN NaN

It's easy to set the values of one series...

>>> df.loc[:, 'A'] = pd.Series([1,2,3,4,5,6], index=dates)
>>> df
            A   B   C   D
2013-01-01  1 NaN NaN NaN
2013-01-02  2 NaN NaN NaN
2013-01-03  3 NaN NaN NaN
2013-01-04  4 NaN NaN NaN
2013-01-05  5 NaN NaN NaN
2013-01-06  6 NaN NaN NaN

But how do I set the values of all columns using broadcasting?

>>> default_values = pd.Series([1,2,3,4,5,6], index=dates)
>>> df.loc[:, :] = default_values
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/billtubbs/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/indexing.py", line 189, in __setitem__
    self._setitem_with_indexer(indexer, value)
  File "/Users/billtubbs/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/indexing.py", line 651, in _setitem_with_indexer
    value=value)
  File "/Users/billtubbs/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py", line 3693, in setitem
    return self.apply('setitem', **kwargs)
  File "/Users/billtubbs/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py", line 3581, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Users/billtubbs/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py", line 940, in setitem
    values[indexer] = value
ValueError: could not broadcast input array from shape (6) into shape (6,4)

Other than these ways:

>>> for s in df:
...     df.loc[:, s] = default_values
... 

Or:

>>> df.loc[:, :] = np.vstack([default_values]*4).T

UPDATE:

Or:

>>> df.loc[:, :] = default_values.values.reshape(6,1)
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
Bill
  • 10,323
  • 10
  • 62
  • 85

4 Answers4

12

Use numpy broadcasting

s =  pd.Series([1,2,3,4,5,6], index=dates)
df.loc[:,:] = s.values[:,None]

Using index matching

df.loc[:] = pd.concat([s]*df.columns.size, axis=1)
rafaelc
  • 57,686
  • 15
  • 58
  • 82
2

The most straight forward way has already provided in Pandas: calling .add method and specify which direction (axis) you want to add new values.

In [7]: df.fillna(0).add(default_values, axis=0)
Out[7]:
              A    B    C    D
2013-01-01  1.0  1.0  1.0  1.0
2013-01-02  2.0  2.0  2.0  2.0
2013-01-03  3.0  3.0  3.0  3.0
2013-01-04  4.0  4.0  4.0  4.0
2013-01-05  5.0  5.0  5.0  5.0
2013-01-06  6.0  6.0  6.0  6.0

Note: in newer pandas versions, you can just do df.add(default_values, axis=0, fill_value=0), basically a syntax improvement to avoiding chained methods.

Note that if the index-alignment idea of pandas applies here: considering this case where the new values only covers 4 out of 5 rows of the target dataframe

In [37]: default_values = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

In [38]: df = pd.DataFrame(np.ones(shape=(5,5)) + np.nan, index=['a', 'b', 'c', 'd', 'e'])

In [39]: df.fillna(0).add(default_values, axis=0)
Out[39]:
     0    1    2    3    4
a  1.0  1.0  1.0  1.0  1.0
b  2.0  2.0  2.0  2.0  2.0
c  3.0  3.0  3.0  3.0  3.0
d  4.0  4.0  4.0  4.0  4.0
e  NaN  NaN  NaN  NaN  NaN

The row e which is not found in the new value Series, becomes NaN

CT Zhu
  • 52,648
  • 17
  • 120
  • 133
  • 1
    That makes sense, but I think OP wants to 'overwrite values'. This case would work for `NaN`s only, and other values would get *old_values+new_values*. But good approach anyway :) – rafaelc Sep 06 '18 at 04:45
  • 1
    That's good. Works in most cases - when values are addable for example. One exception is when default_values contains values for indices that are not in df. Then you get unwanted `NaN`s! – Bill Sep 06 '18 at 04:45
  • @Bill, very good point. In my opinion that is intentional, and follows the general index-alignment design in pandas. I will add an example shortly. – CT Zhu Sep 06 '18 at 04:52
2

I landed here looking for a solution to both creating new columns and assigning a single default value per column (not per row). While this isn't exactly what the OP requested, I found this solution works well. Please comment and redirect to a specific thread for this if appropriate:

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.nan, index=dates, columns=list('ABCD'))
default_values = pd.Series([1,2,3,4], index=['A','B','C','D'] ).to_dict()
df = df.assign( **default_values )   # note use of ** notation (kwargs)
In [97]: df                                                                                                                                      
Out[97]: 
            A  B  C  D
2013-01-01  1  2  3  4
2013-01-02  1  2  3  4
2013-01-03  1  2  3  4
2013-01-04  1  2  3  4
2013-01-05  1  2  3  4
2013-01-06  1  2  3  4
kpickrell
  • 21
  • 1
1

You can solve this with NumPy:

nvalues = 6
ncolumns = 4
default_values = np.repeat(np.arange(nvalues), ncolumns).reshape(nvalues, ncolumns)

df.loc[:, :] = default_values

However this doesn't address your hope for broadcasting on the Pandas side. I don't know of any tricks to achieve that.

Andrey Portnoy
  • 1,430
  • 15
  • 24
  • Thanks, that's similar to the second solution in the question. I just can't believe that the broadcasting doesn't work in pandas... – Bill Sep 06 '18 at 03:48
  • The downside with using only numpy is that the order of `default_values` might not be the same and therefore the index of default_values should be used ideally. – Bill Sep 06 '18 at 03:49
  • @Bill I don't think I understand. What do you mean by "order might not be the same"? – Andrey Portnoy Sep 06 '18 at 03:57
  • Say if default_values is a series that has been reversed (sorted descending) then pandas would still assign the values to the correct rows by matching the index values first. Whereas numpy 'blindly' pastes in the values in the order given. – Bill Sep 06 '18 at 04:01
  • @Bill I see. Thanks for taking the time to explain. – Andrey Portnoy Sep 06 '18 at 04:15