2

I have some data that looks something like this:

date_time, user, page
12345, A, index
13456, A, index
14566, B, home
...

I'd like to store the index of each row (i.e., its order when sorted by date_time), both overall, and per page.

Overall is simple. Just something like:

df['overall_count'] = range(len(df))

But I can't figure out how to do it for the pages. The following code gets me what I want, but it's connected to the groupby object, and I can't figure out how to move it to the main dataframe.

grouped = df.groupby('page')
for name, group in grouped:
    group = group.sort_values('date_time')
    group['page_count'] = range(len(group))
cottontail
  • 10,268
  • 18
  • 50
  • 51
Jeremy
  • 1,960
  • 4
  • 21
  • 42

1 Answers1

3

If you want to assign group-wise indices, you can use cumcount:

df.groupby('page').cumcount()
ayhan
  • 70,170
  • 20
  • 182
  • 203
  • That's not quite what I'm asking for - I want the index of where each row is within the group - not the total count of items in the group. – Jeremy Nov 30 '16 at 19:17
  • Your answer totally helped me to figure it out. What I want is: `d['page_index'] = d.groupby('page')['page'].transform(lambda x: range(len(x)))` – Jeremy Nov 30 '16 at 19:22
  • Sorry I misunderstood your question. Can you try `df.groupby('page').cumcount()` maybe? – ayhan Nov 30 '16 at 19:23