0

Working with Python, I need to create two new variables.

One (See JourneyID in example) that cummulatively increases by one each time the previous row of another column takes the value '1', and

One (See JourneyN in example) that cummulatively increases by one each time the previous row of another column takes the value '1', but starts over from 1 every time the Respondent ID increases by 1.

m = df['Purpose'] == 1
df.loc[m, 'JourneyID'] = m.cumsum()

Returns df[JourneyID] = [1,1,1,2,1,1,3,1,4] when it should return [1,1,2,2,3,3,3,4,4] for ID.

Any help is greatly appreciated.

Example of what I need to do

Cleptus
  • 3,446
  • 4
  • 28
  • 34
nielsen
  • 383
  • 1
  • 6
  • This might be something you would know how to answer, @yatu . I corrected the confusing example. – nielsen Apr 15 '20 at 12:26

1 Answers1

1

Its not super clean, but should get you what you need:

helper = ((df['Purpose']==1).cumsum()+1).shift(1)
helper[0]=1
df['JourneyID'] =  helper

JourneyN I did not fully understand :)

sltzgs
  • 166
  • 6
  • Unfortunately, this solution has the same problem that I got. I hoped taking a look at the table, that I made could give an idea of what it should look like. – nielsen Apr 15 '20 at 12:56