Determining the cumulative maximum of a column

Question

I am trying the following code

df = pd.DataFrame([[23, 52], [36, 49], [52, 61], [75, 82], [97, 12]], columns=['A', 'B'])
df['C'] = np.where(df['A'] > df['C'].shift(), df['A'], df['C'].shift())
print(df)

Assumption is that first df['C].shift() operation should assume as 0 (since df['C'] is non existent)

Expected output

    A   B   C
0  23  52  23
1  36  49  36
2  12  61  36
3  75  82  75
4  70  12  75

but I'm getting a KeyError exception.

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\indexes\base.py", line 2442, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)
  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)
KeyError: 'C'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Development\workspace\TestPython\TestPython.py", line 6, in <module>
    df['C'] = np.where(df['A'] > df['C'].shift(), df['B'].shift(), df['A'])
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\frame.py", line 1964, in __getitem__
    return self._getitem_column(key)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\frame.py", line 1971, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\generic.py", line 1645, in _get_item_cache
    values = self._data.get(item)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\internals.py", line 3590, in get
    loc = self.items.get_loc(item)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\indexes\base.py", line 2444, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)
  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)
KeyError: 'C'

As per my understanding this is happening, because for the first time Column C is non existent, so shifting the column throws this exception.

My questions is there an alternate way of solving this problem?

Er.. you created a df with no 'C' column, so you get a `KeyError` unsurprisingly. What is the expected output? Shouldn't you be using column 'B'? — EdChum, Aug 25 '17 at 10:12
Did you mean `np.where(df['A'] > df['B'].shift(), df['B'].shift(), df['A'])`? — cs95, Aug 25 '17 at 10:15
I need it this way. So, `df.loc[0: 'C']` could be `0` or `df['A']` and then from `df.loc[1: 'C']` the `df['C'] = np.where(df['A'] > df['C'].shift(), df['C'].shift(), df['A'])` takes over — arkochhar, Aug 25 '17 at 10:21
For the df above, please show us some desired output. Shift may not be the way. — cs95, Aug 25 '17 at 10:23
@cᴏʟᴅsᴘᴇᴇᴅ I have updated the original post to show what the expected output should be. If `shift` is not the way, then another suggestion would be appreciated. Thanks — arkochhar, Aug 25 '17 at 10:36
Your A and B columns in your output don't match your input... — cs95, Aug 25 '17 at 10:41
@cᴏʟᴅsᴘᴇᴇᴅ actually col `df['B']` has not role. It just placed there to signify that we have a `DataFrame` and not a `Series`. We just have col `df['A']` used in computations while we are building col `df['C']` with `shift`. — arkochhar, Aug 25 '17 at 10:46
@arkochhar No problem. I've replicated your data as in your output. — cs95, Aug 25 '17 at 10:47
@cᴏʟᴅsᴘᴇᴇᴅ assumption is that first `df['C].shift()` operation would assume as 0 (since col `df['C']` is non existent for first operation) — arkochhar, Aug 25 '17 at 10:48
@cᴏʟᴅsᴘᴇᴇᴅ, I have a related issue, if you could look into it at [link](https://stackoverflow.com/questions/45896726/access-previous-value-in-same-dataframe-column). Thanks — arkochhar, Aug 26 '17 at 15:04

score 6 · Answer 1 · answered Aug 25 '17 at 10:47

6

you need cummax:

df['C'] = df.A.cummax()

answered Aug 25 '17 at 10:47

acushner

9,595
1
34
34

Nice! Didn't know about this. (+1) – cs95 Aug 25 '17 at 10:49

Determining the cumulative maximum of a column

1 Answers1

Linked

Related