3

I have some data from 2019 and 2020 starting in March until the end of May for each year.

I've done this to the datetime

    ####Working with Date
df['Date']= pd.to_datetime(df['Date'])
df['Time_Hour'] = df['Date'].apply(lambda x: x.hour)
df['Month'] = df['Date'].apply(lambda x: x.month)
df['Year'] = df['Date'].apply(lambda x: x.year)
df['Day'] = df['Date'].apply(lambda x: x.day)
df.set_index('Date', inplace=True)

And seperated the data by year

df19 = df[df['Year'] == 2019]
df20 = df[df['Year'] == 2020]

If I run the resample for each year seperately

df19.resample('D').size().plot(title= '2019', legend=False)
#df20.resample('D').size().plot(title= '2019', legend=False)

I output the desired charts

enter image description here

#df19.resample('D').size().plot(title= '2019', legend=False)
df20.resample('D').size().plot(title= '2019', legend=False)

enter image description here

However I'd like to be able to compare both charts together, when I try to run them at the same time this is the chart that is plotted

df19.resample('D').size().plot(title= '2019', legend=False)
df20.resample('D').size().plot(title= '2019', legend=False)

enter image description here

I realize that the resample(D) still looks at the year and thus the combined chart is correct but I'd like to compare them together based on Day.

Any suggestions on how to get this done?

Edit: Here is an .head(20) of the raw data (it doesn't show the 2020 data because its so large)

               Date        ID Case Number
0    3/1/2019 23:43  11610968    JC170728
1     3/3/2019 3:17  11611832    JC171944
2     3/4/2019 0:42  11612609    JC172806
3    3/4/2019 23:13  11613442    JC173895
4    3/10/2019 6:29  11618474    JC179991
5   3/12/2019 21:38  11621181    JC183336
6   3/14/2019 10:14  11623047    JC184966
7   3/14/2019 23:00  11623349    JC185895
8   3/14/2019 23:35  11623295    JC185924
9    3/15/2019 2:25  11623400    JC185990
10  3/15/2019 19:25  11624307    JC187019
11  3/15/2019 21:12  11624280    JC187114
12   3/16/2019 0:30  11624491    JC187272
13  3/16/2019 18:25  11625210    JC188160
14  3/17/2019 21:35  11626248    JC189475
15  3/18/2019 21:49  11627419    JC190873
16  3/20/2019 16:15  11629464    JC193053
17  3/21/2019 17:50  11630638    JC194480
18   3/22/2019 0:22  11630719    JC194815
19   3/22/2019 4:43  11630853    JC194892

Here is the .head(10) for the seperated dataframes by year as mentioned above

                 Date        ID Case Number
0 2019-03-01 23:43:00  11610968    JC170728
1 2019-03-03 03:17:00  11611832    JC171944
2 2019-03-04 00:42:00  11612609    JC172806
3 2019-03-04 23:13:00  11613442    JC173895
4 2019-03-10 06:29:00  11618474    JC179991
5 2019-03-12 21:38:00  11621181    JC183336
6 2019-03-14 10:14:00  11623047    JC184966
7 2019-03-14 23:00:00  11623349    JC185895
8 2019-03-14 23:35:00  11623295    JC185924
9 2019-03-15 02:25:00  11623400    JC185990
                  Date        ID Case Number
84 2020-03-01 09:20:00  11996077    JD170035
85 2020-03-04 21:50:00  11999611    JD174374
86 2020-03-06 23:24:00  12001746    JD176808
87 2020-03-07 20:53:00  12002531    JD177851
88 2020-03-07 21:03:00  12002529    JD177805
89 2020-03-11 05:45:00  12005695    JD181579
90 2020-03-13 04:43:00  12007615    JD183927
91 2020-03-13 05:10:00  12007594    JD183934
92 2020-03-13 14:50:00  12008297    JD184421
93 2020-03-16 19:30:00  12011057    JD187968

1 Answers1

0

This is a hacked together 'solution' that I figured out but am still open to something better. I ended up using the offset for the 2020 data by 1 year (365 days).

df19.resample('D').size().plot()
df20.resample('D', loffset = '-365d').size().plot()

enter image description here