0

I am trying the following code

df = pd.DataFrame([[23, 52], [36, 49], [52, 61], [75, 82], [97, 12]], columns=['A', 'B'])
df['C'] = np.where(df['A'] > df['C'].shift(), df['A'], df['C'].shift())
print(df)

Assumption is that first df['C].shift() operation should assume as 0 (since df['C'] is non existent)

Expected output

    A   B   C
0  23  52  23
1  36  49  36
2  12  61  36
3  75  82  75
4  70  12  75

but I'm getting a KeyError exception.

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\indexes\base.py", line 2442, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)
  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)
KeyError: 'C'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Development\workspace\TestPython\TestPython.py", line 6, in <module>
    df['C'] = np.where(df['A'] > df['C'].shift(), df['B'].shift(), df['A'])
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\frame.py", line 1964, in __getitem__
    return self._getitem_column(key)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\frame.py", line 1971, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\generic.py", line 1645, in _get_item_cache
    values = self._data.get(item)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\internals.py", line 3590, in get
    loc = self.items.get_loc(item)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\indexes\base.py", line 2444, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)
  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)
KeyError: 'C'

As per my understanding this is happening, because for the first time Column C is non existent, so shifting the column throws this exception.

My questions is there an alternate way of solving this problem?

Petter Friberg
  • 21,252
  • 9
  • 60
  • 109
arkochhar
  • 369
  • 1
  • 4
  • 10
  • 1
    Er.. you created a df with no 'C' column, so you get a `KeyError` unsurprisingly. What is the expected output? Shouldn't you be using column 'B'? – EdChum Aug 25 '17 at 10:12
  • Did you mean `np.where(df['A'] > df['B'].shift(), df['B'].shift(), df['A'])`? – cs95 Aug 25 '17 at 10:15
  • I need it this way. So, `df.loc[0: 'C']` could be `0` or `df['A']` and then from `df.loc[1: 'C']` the `df['C'] = np.where(df['A'] > df['C'].shift(), df['C'].shift(), df['A'])` takes over – arkochhar Aug 25 '17 at 10:21
  • For the df above, please show us some desired output. Shift may not be the way. – cs95 Aug 25 '17 at 10:23
  • @cᴏʟᴅsᴘᴇᴇᴅ I have updated the original post to show what the expected output should be. If `shift` is not the way, then another suggestion would be appreciated. Thanks – arkochhar Aug 25 '17 at 10:36
  • Your A and B columns in your output don't match your input... – cs95 Aug 25 '17 at 10:41
  • @cᴏʟᴅsᴘᴇᴇᴅ actually col `df['B']` has not role. It just placed there to signify that we have a `DataFrame` and not a `Series`. We just have col `df['A']` used in computations while we are building col `df['C']` with `shift`. – arkochhar Aug 25 '17 at 10:46
  • @arkochhar No problem. I've replicated your data as in your output. – cs95 Aug 25 '17 at 10:47
  • @cᴏʟᴅsᴘᴇᴇᴅ assumption is that first `df['C].shift()` operation would assume as 0 (since col `df['C']` is non existent for first operation) – arkochhar Aug 25 '17 at 10:48
  • @cᴏʟᴅsᴘᴇᴇᴅ, I have a related issue, if you could look into it at [link](https://stackoverflow.com/questions/45896726/access-previous-value-in-same-dataframe-column). Thanks – arkochhar Aug 26 '17 at 15:04

1 Answers1

6

you need cummax:

df['C'] = df.A.cummax()
acushner
  • 9,595
  • 1
  • 34
  • 34