0

I am looking to add a column that counts consecutive positive numbers and resets the counter on finding a negative on a pandas dataframe. I might be able to loop through it with 'for' statement but I just know there is a better solution. I have looked at various similar posts that almost ask the same but I just cannot get those solutions to work on my problem.

I have:

Slope
-25
-15
 17
 6
 0.1
 5
-3
 5
 1
 3
-0.1
-0.2
 1
-9

What I want:

Slope  Count
-25      0
-15      0
 17      1
 6       2
 0.1     3
 5       4
 -3      0
 5       1
 1       2
 3       3
-0.1     0
-0.2     0
 1       1
-9       0

Please keep in mind that this a low-skill level question. If there are multiple steps on your proposed solution, please explain each. I would like an answer, but would prefer for me to understand the 'how'.

chepox
  • 45
  • 6

4 Answers4

1

we can solve the problem by looping through all the rows and using the loc feature in pandas. Assuming that you already have a dataframe named df with a column called slope. The idea is that we are going to sequentially add one to the previous row, but if we ever hit a count where slope_i < 0 the row is multiplied by 0.

df['new_col'] = 0  # just preset everything to be zero

for i in range(1, len(df)):
    df.loc[i, 'new_col'] = (df.loc[i-1, 'new_col'] + 1) * (df.loc[i, 'slope'] >= 0)
bdempe
  • 308
  • 2
  • 9
1

You first want to mark the positions where new segments (i.e., groups) start:

>>> df['Count'] = df.Slope.lt(0)
>>> df.head(7)
    Slope  Count
0   -25.0   True
1   -15.0   True
2    17.0  False
3     6.0  False
4     0.1  False
5     5.0  False
6    -3.0   True

Now you need to label each group using the cumulative sum: as True is evaluated as 1 in mathematical equations, the cumulative sum will label each segment with an incrementing integer. (This is a very powerful concept in pandas!)

>>> df['Count'] = df.Count.cumsum()
>>> df.head(7)
    Slope  Count
0   -25.0      1
1   -15.0      2
2    17.0      2
3     6.0      2
4     0.1      2
5     5.0      2
6    -3.0      3

Now you can use groupby to access each segment, then all you need to do is generate an incrementing sequence starting at zero for each group. There are many ways to do that, I'd just use the (reset'ed) index of each group, i.e., reset the index, get the fresh RangeIndex starting at 0, and turn it into a series:

>>> df.groupby('Count').apply(lambda x: x.reset_index().index.to_series())
Count   
1      0    0
2      0    0
       1    1
       2    2
       3    3
       4    4
3      0    0
       1    1
       2    2
       3    3
4      0    0
5      0    0
       1    1
6      0    0

This results in the expected counts, but note that the final index doesn't match the original dataframe, so we need another reset_index() with drop=True to discard the grouped index to put this into our original dataframe:

>>> df['Count'] = df.groupby('Count').apply(lambda x:x.reset_index().index.to_series()).reset_index(drop=True)

Et voilá:

>>> df
    Slope  Count
0   -25.0      0
1   -15.0      0
2    17.0      1
3     6.0      2
4     0.1      3
5     5.0      4
6    -3.0      0
7     5.0      1
8     1.0      2
9     3.0      3
10   -0.1      0
11   -0.2      0
12    1.0      1
13   -9.0      0
fsimonjetz
  • 5,644
  • 3
  • 5
  • 21
  • Thanks! This works. It was a lot more convoluted than I originally thought. Thank you for the detailed explanation. – chepox Mar 31 '22 at 17:56
1

you can do this by using the groupby-command. It requires some steps, which probably could be shortened, but it works this way.

First, you create a reset column by finding negative numbers

# create reset condition
df['reset'] = df.slope.lt(0)

Then you create groups with a cumsum() to this resets --> at this point every group of positives gets an unique group value. the last line here gives all negative numbers the group 0

# create groups of positive values
df['group'] = df.reset.cumsum()
df.loc[df['reset'], 'group'] = 0

Now you take the groups of positives and cumsum some ones (there MUST be a better solution than that) to get your result. The last line again cleans up results for negative values

# sum ones :-D
df['count'] = 1
df['count'] = df.groupby('group')['count'].cumsum()
df.loc[df['reset'], 'count'] = 0

It is not that fine one-line, but especially for larger datasets it should be faster than iterating through the whole dataframe

for easier copy&paste the whole thing (including some commented lines which replace the lines before. makes it shorter but harder to understand)

import pandas as pd

## create data
slope = [-25, -15, 17, 6, 0.1, 5, -3, 5, 1, 3, -0.1, -0.2, 1, -9]
df = pd.DataFrame(data=slope, columns=['slope'])

## create reset condition
df['reset'] = df.slope.lt(0)

## create groups of positive values
df['group'] = df.reset.cumsum()
df.loc[df['reset'], 'group'] = 0
# df['group'] = df.reset.cumsum().mask(df.reset, 0)


## sum ones :-D
df['count'] = 1
df['count'] = df.groupby('group')['count'].cumsum()
df.loc[df['reset'], 'count'] = 0
# df['count'] = df.groupby('group')['count'].cumsum().mask(df.reset, 0)
flipSTAR
  • 579
  • 1
  • 4
  • 18
0

IMO, solving this problem iteratively is the only way because there is a condition that has to meet. you can use any iterative way like for or while. solving this problem with map will be troublesome since this problem still need the previous element to be modified and assign to current element

  • That is what I was guessing. I just read that we should avoid iterating over dataframes. Thank you for your comment. – chepox Mar 29 '22 at 20:58
  • 1
    [python pandas- adding values of a column above the row you're on](https://stackoverflow.com/questions/31388637/python-pandas-adding-values-of-a-column-above-the-row-youre-on) . I think your problem is similar to this question – dimas krisrianto Mar 29 '22 at 21:11
  • You can access previous elements without loops, using `shift`. But that isn't necessary here (see my answer). – fsimonjetz Mar 30 '22 at 08:54