1

Here is my code:

a = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]], columns=['A', 'B'])

print(a)

a['C'] = 1 # or np.nan or is there a way to avoid this?

b = lambda i : i['A'] + i['B'] + i['C'] # actually what is needed if to access a previous element, like i['C'].shift()

a['C'] = a.apply(b, axis=1)

print(a)

Which works fine but in the lambda, I want to access i['C'].shift(1) but I get following exception if use it this way;

Traceback (most recent call last):
  File "C:\Users\Development\workspace\TestPython\TestPython.py", line 31, in <module>
    a['C'] = a.apply(b, axis=1)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\frame.py", line 4262, in apply
    ignore_failures=ignore_failures)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\frame.py", line 4358, in _apply_standard
    results[i] = func(v)
  File "C:\Users\Development\workspace\TestPython\TestPython.py", line 29, in <lambda>
    b = lambda i : i['A'] + i['B'] + i['C'].shift() # actually what is needed if to access a previous element, like i['C'].shift()
AttributeError: ("'numpy.int64' object has no attribute 'shift'", 'occurred at index 0')

And also I want to avoid initialising a['C'] = 1, if it is possible, which means that a['C'] is a new column being added in this operation.

Any suggestions or alternate way of achieving this?

arkochhar
  • 369
  • 1
  • 4
  • 10
  • Please provide your actual expected output. – cs95 Aug 26 '17 at 16:37
  • Take a look at the answer here: https://stackoverflow.com/questions/44455481/how-can-i-vectorize-a-function-that-uses-lagged-values-of-its-own-output – vestland Aug 26 '17 at 17:23

2 Answers2

0

I guess you need this:

a['C'] = a['A'] + a['B']
a['D'] = a['C'].cumsum()

because summing with previous element is a cumulative sum.

Aleksandr Borisov
  • 2,136
  • 1
  • 13
  • 14
0

From your code:

# Variable a BEFORE apply
   A   B
0  1   2
1  3   4
2  5   6
3  7   8
4  9  10

# Variable a AFTER apply
   A   B   C
0  1   2   4
1  3   4   8
2  5   6  12
3  7   8  16
4  9  10  20

Assuming this output is really what you want, then:

a = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]], columns=['A', 'B'])

a['C'] = a['A'] + a['B'] + 1

I'm a little confused as to why you would want to access a['C'].shift(1) since all the values are the same anyway, and you are trying not to initialize it.

If you want a working example of using df.shift(n), try:

a['Shift'] = a['A'] + a['B'].shift(1)

Which would give you:

   A   B   C  Shift
0  1   2   4    NaN
1  3   4   8    5.0
2  5   6  12    9.0
3  7   8  16   13.0
4  9  10  20   17.0

This would give you A(i) + B(i+1), where i is the row number. Since you shifted column B by 1, the first sum is NaN.

Yeile
  • 608
  • 1
  • 6
  • 20
  • Thanks for your inputs. My actual algorithm is more complicated than this. I need `lambda` to be `b = lambda i : np.where(np.logical_or(i['A'] < i['C'].shift(), i['B'].shift() > i['C'].shift()), i['A'], i['C'].shift())`. You can refer to my original post at [link](https://stackoverflow.com/questions/44935269/supertrend-code-using-pandas-python) – arkochhar Aug 27 '17 at 16:52
  • Your code is too difficult to decipher.. I know of no way to compare successive columns iteratively in pandas [i v i+1] directly without looping. The most common technique is to create a shifted column `a{'C-Shift'] = a['C'].Shift(1)`, then compare the entire columns eg. `a['A'] < a['C-Shift]`. – Yeile Aug 28 '17 at 06:33