0

I have the following Pandas DF:

visit_date|house_id
----------+---------
2017-12-27|892815605
2018-01-03|892807836
2018-01-03|892815815
2018-01-03|892812970
2018-01-03|892803143
2018-01-03|892815463
2018-01-03|892816168
2018-01-03|892814475
2018-01-03|892813594
2018-01-03|892813557
2018-01-03|892809834
2018-01-03|892809834
2018-01-03|892803143
2018-01-03|892803143
2018-01-03|892800500
2018-01-03|892806236
2018-01-03|892810789
2018-01-03|892797487
2018-01-03|892815182
2018-01-03|892814514
2018-01-03|892778046
2018-01-03|892809386
2018-01-03|892816048
2018-01-03|892816048
2018-01-03|892816078
2018-01-03|892810643

I need to know the most visited house (house_id) in each month(month).

How do I do that? I did a groupby:

df_1.groupby(by=['house_id', 'month']).count().reset_index().sort_values(by=['month'], ascending=True, ignore_index=True)

But it say me anything. So I try to do that for each month:

df_1[df_1['month']==1].groupby(by=['house_id']).count().reset_index().sort_values(by=['month'], ascending=True, ignore_index=True).tail(1)

df_1[df_1['month']==2].groupby(by=['house_id']).count().reset_index().sort_values(by=['month'], ascending=True, ignore_index=True).tail(1)

and so on...

But I think there is a clever way to do that. But I don't know. Is it possible to iterate? How do I iterate to know the most visited house in each month ({1:'January', ... 12:'December'}) Thanks a lot

Bruhlickd
  • 69
  • 7
  • What exactly was wrong w/the first attempt? If you would post your sample data as text instead of a picture, it would be much easier to help you. – Chris Jul 17 '22 at 14:43
  • Sorry, I'v just post the text instead image above. The df is huge. So there are a lot of dates yyyy-mm-dd. I am just focused on month. – Bruhlickd Jul 17 '22 at 14:59

1 Answers1

1

Adopted from a similar answer

df = pd.DataFrame({'visit_date': ['2017-12-27',  '2018-01-03',  '2018-01-03',  '2018-01-03',
  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',
  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',
  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',
  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03'],
 'house_id': [892815605,  892807836,  892815815,  892812970,  892803143,  892815463,  892816168,
  892814475,  892813594,  892813557,  892809834,  892809834,  892803143,  892803143,  892800500,
  892806236,  892810789,  892797487,  892815182,  892814514,  892778046,  892809386,  892816048,
  892816048,  892816078,  892810643]})

df['month'] = pd.to_datetime(df['visit_date']).dt.month
df = df.groupby(['month','house_id']).size().groupby(level=0).nlargest(1).droplevel(1).reset_index(name='count')
print(df)

Output

   month   house_id  count
0      1  892803143      3
1     12  892815605      1
Chris
  • 15,819
  • 3
  • 24
  • 37