3

In a dataframe, I would like to count how many of the prices from the previous 10 days are greater than today's price. Result would look like this:

price   ct>prev10
50.00   
51.00   
52.00   
50.50   
51.00   
50.00   
50.50   
53.00   
52.00   
49.00   
51.00   3

I have seen this post answered by DSM, but the requirement was different in that the base for comparison was a static number as opposed to the current row:

Achieving "countif" with pd.rolling_sum()

Of course I would like to do this without looping through 1x1. Pretty much stumped - thanks in advance for any advise.

Community
  • 1
  • 1
MJS
  • 1,573
  • 3
  • 17
  • 26

2 Answers2

4

You can use a rolling_apply function on the series. I used a window length of 5 given the small size of the sample data, but you can easily change it.

The lambda function counts the number of items in the rolling group (excluding the last item) is greater than the last item.

df = pd.DataFrame({'price': [50, 51, 52, 50.5, 51, 50, 50.5, 53, 52, 49, 51]})

window = 5  # Given that sample data only contains 11 values.
df['price_count'] = pd.rolling_apply(df.price, window, 
                                     lambda group: sum(group[:-1] > group[-1]))
>>> df
    price  price_count
0    50.0          NaN
1    51.0          NaN
2    52.0          NaN
3    50.5          NaN
4    51.0            1
5    50.0            4
6    50.5            2
7    53.0            0
8    52.0            1
9    49.0            4
10   51.0            2

In the example above, the first group is the prices with index values 0-4. You can see what is happening with:

group = df.price[:window].values
>>> group
array([ 50. ,  51. ,  52. ,  50.5,  51. ])

Now, do your comparison of the previous four prices to the current price:

>>> group[:-1] > group[-1]
array([False, False,  True, False], dtype=bool)

Then, you are just summing the boolean values:

>>> sum(group[:-1] > group[-1])
1

This is the value that gets put into the first closing window at index 4.

Alexander
  • 105,104
  • 32
  • 201
  • 196
  • hi Alexander, ty for your fast response. I believe this is exactly what I need, but I don't completely understand it yet. If I would like to have the lookback period be 5 days NOT including today, how would I adjust what you wrote? I know I can make window=6, but just to better understand your solution, would it be: pd.rolling_apply(df.price, window, lambda group: sum(group[:-1] > group[-2])) ? – MJS Oct 22 '15 at 20:39
  • The last day of the window is excluded from the comparison. See edit above. – Alexander Oct 22 '15 at 20:49
1

Here's a vectoized approach with NumPy module that supports broadcasting for implementing vectorized methods -

import numpy as np
import pandas as pd

# Sample input dataframe
df = pd.DataFrame({'price': [50, 51, 52, 50.5, 51, 50, 50.5, 53, 52, 49, 51]})

# Convert to numpy array for counting purposes
A = np.array(df['price'])

W = 5 # Window size

# Initialize another column for storing counts
df['price_count'] = np.nan

# Get counts and store as a new column in dataframe
C = (A[np.arange(A.size-W+1)[:,None] + np.arange(W-1)] > A[W-1:][:,None]).sum(1)
df['price_count'][W-1:] = C

Sample run -

>>> df
    price
0    50.0
1    51.0
2    52.0
3    50.5
4    51.0
5    50.0
6    50.5
7    53.0
8    52.0
9    49.0
10   51.0
>>> A = np.array(df['price'])
>>> W = 5 # Window size
>>> df['price_count'] = np.nan
>>> 
>>> C=(A[np.arange(A.size-W+1)[:,None] + np.arange(W-1)] > A[W-1:][:,None]).sum(1)
>>> df['price_count'][W-1:] = C
>>> df
    price  price_count
0    50.0          NaN
1    51.0          NaN
2    52.0          NaN
3    50.5          NaN
4    51.0            1
5    50.0            4
6    50.5            2
7    53.0            0
8    52.0            1
9    49.0            4
10   51.0            2
Divakar
  • 218,885
  • 19
  • 262
  • 358