I have some data from 2019 and 2020 starting in March until the end of May for each year.
I've done this to the datetime
####Working with Date
df['Date']= pd.to_datetime(df['Date'])
df['Time_Hour'] = df['Date'].apply(lambda x: x.hour)
df['Month'] = df['Date'].apply(lambda x: x.month)
df['Year'] = df['Date'].apply(lambda x: x.year)
df['Day'] = df['Date'].apply(lambda x: x.day)
df.set_index('Date', inplace=True)
And seperated the data by year
df19 = df[df['Year'] == 2019]
df20 = df[df['Year'] == 2020]
If I run the resample for each year seperately
df19.resample('D').size().plot(title= '2019', legend=False)
#df20.resample('D').size().plot(title= '2019', legend=False)
I output the desired charts
#df19.resample('D').size().plot(title= '2019', legend=False)
df20.resample('D').size().plot(title= '2019', legend=False)
However I'd like to be able to compare both charts together, when I try to run them at the same time this is the chart that is plotted
df19.resample('D').size().plot(title= '2019', legend=False)
df20.resample('D').size().plot(title= '2019', legend=False)
I realize that the resample(D) still looks at the year and thus the combined chart is correct but I'd like to compare them together based on Day.
Any suggestions on how to get this done?
Edit: Here is an .head(20) of the raw data (it doesn't show the 2020 data because its so large)
Date ID Case Number
0 3/1/2019 23:43 11610968 JC170728
1 3/3/2019 3:17 11611832 JC171944
2 3/4/2019 0:42 11612609 JC172806
3 3/4/2019 23:13 11613442 JC173895
4 3/10/2019 6:29 11618474 JC179991
5 3/12/2019 21:38 11621181 JC183336
6 3/14/2019 10:14 11623047 JC184966
7 3/14/2019 23:00 11623349 JC185895
8 3/14/2019 23:35 11623295 JC185924
9 3/15/2019 2:25 11623400 JC185990
10 3/15/2019 19:25 11624307 JC187019
11 3/15/2019 21:12 11624280 JC187114
12 3/16/2019 0:30 11624491 JC187272
13 3/16/2019 18:25 11625210 JC188160
14 3/17/2019 21:35 11626248 JC189475
15 3/18/2019 21:49 11627419 JC190873
16 3/20/2019 16:15 11629464 JC193053
17 3/21/2019 17:50 11630638 JC194480
18 3/22/2019 0:22 11630719 JC194815
19 3/22/2019 4:43 11630853 JC194892
Here is the .head(10) for the seperated dataframes by year as mentioned above
Date ID Case Number
0 2019-03-01 23:43:00 11610968 JC170728
1 2019-03-03 03:17:00 11611832 JC171944
2 2019-03-04 00:42:00 11612609 JC172806
3 2019-03-04 23:13:00 11613442 JC173895
4 2019-03-10 06:29:00 11618474 JC179991
5 2019-03-12 21:38:00 11621181 JC183336
6 2019-03-14 10:14:00 11623047 JC184966
7 2019-03-14 23:00:00 11623349 JC185895
8 2019-03-14 23:35:00 11623295 JC185924
9 2019-03-15 02:25:00 11623400 JC185990
Date ID Case Number
84 2020-03-01 09:20:00 11996077 JD170035
85 2020-03-04 21:50:00 11999611 JD174374
86 2020-03-06 23:24:00 12001746 JD176808
87 2020-03-07 20:53:00 12002531 JD177851
88 2020-03-07 21:03:00 12002529 JD177805
89 2020-03-11 05:45:00 12005695 JD181579
90 2020-03-13 04:43:00 12007615 JD183927
91 2020-03-13 05:10:00 12007594 JD183934
92 2020-03-13 14:50:00 12008297 JD184421
93 2020-03-16 19:30:00 12011057 JD187968