I have a dataframe of visits that includes a person ID column, and a given person may have more than one visit. I want to number the visits for a given person.
I can sort the dataframe by visit date, then iterate and count the number of times a person ID has occurred, or its occurrences between the first index and current index. Is there a more efficient (ideally vectorized) way? (Note that I don't want the total counts as from value_counts()
, I want a new series that is a running count of visits per person.)
Here is a dummy dataframe where person 'a' has two visits:
In[1]: df = pd.DataFrame({'Visit': [0, 1, 2, 3, 4, 5], 'Person_ID': ['a', 'b', 'c', 'a', 'd', 'e']})
In[2]: df['Person_ID'].value_counts()
Out[2]:
a 2
b 1
c 1
d 1
e 1
Name: Person_ID, dtype: int64
Example 1: incrementing dictionary counter
In[3]: person_visit_count = {}
for index, row in df.iterrows():
if row['Person_ID'] not in person_visit_count:
person_visit_count[row['Person_ID']] = 1
else:
person_visit_count[row['Person_ID']] += 1
df.loc[index, 'Person_Visit'] = person_visit_count[row['Person_ID']]
In[4]: df
Out[4]:
Visit Person_ID Person_Visit
0 0 a 1.0
1 1 b 1.0
2 2 c 1.0
3 3 a 2.0
4 4 d 1.0
5 5 e 1.0
Example 2: index-based counter
In[5]: for index, row in df.iterrows():
df.loc[index, 'Person_Visit2'] = (df.loc[0:index, 'Person_ID'] == row['Person_ID']).sum()
In[6]: df
Out[6]:
Visit Person_ID Person_Visit Person_Visit2
0 0 a 1.0 1.0
1 1 b 1.0 1.0
2 2 c 1.0 1.0
3 3 a 2.0 2.0
4 4 d 1.0 1.0
5 5 e 1.0 1.0