0

I have a dataframe that looks something like this:

df

Name    date          season  binary
Apple   2016-04-24    MAM     1
Banana  2016-05-01    MAM     1
Berry   2016-05-01    MAM     1
Berry   2017-11-05    OND     1
Orange  2016-11-27    OND     1
Kiwi    2017-03-27    MAM     1

And I have a long date time object of a bunch of dates like this:

df1

date        season  Week
2016-03-28  MAM     13
2016-04-04  MAM     14
2016-04-11  MAM     15
2016-04-18  MAM     16
2016-04-25  MAM     17
2016-05-02  MAM     18
2016-05-09  MAM     19
2016-05-16  MAM     20
2016-05-23  MAM     21
2016-05-30  MAM     22
2016-06-06  MAM     23
2016-06-13  MAM     24
2016-10-03  OND     40
2016-10-10  OND     41
2016-10-17  OND     42
2016-10-24  OND     43
2016-10-31  OND     44
2016-11-07  OND     45
2016-11-14  OND     46
2016-11-21  OND     47
2017-03-13  MAM     11
2017-03-20  MAM     12
2017-03-27  MAM     13
2017-04-03  MAM     14
2017-04-10  MAM     15
2017-04-17  MAM     16
2017-04-24  MAM     17
2017-05-01  MAM     18
2017-05-08  MAM     19
2017-05-15  MAM     20
2017-05-22  MAM     21
2017-05-29  MAM     22
2017-06-19  MAM     25
2017-06-26  MAM     26
2017-07-03  MAM     27
2017-07-10  MAM     28
2017-07-17  MAM     29
2017-07-24  MAM     30
2017-07-31  MAM     31
2017-08-07  MAM     32
2017-08-14  MAM     33
2017-08-21  MAM     34
2017-08-28  MAM     35
2017-09-04  MAM     36
2017-09-11  MAM     37
2017-09-18  MAM     38
2017-09-25  MAM     39
2017-10-02  OND     40
2017-10-09  OND     41
2017-10-16  OND     42
2017-10-23  OND     43
2017-10-30  OND     44
2017-11-06  OND     45
2017-11-13  OND     46
2017-11-20  OND     47

where Week is dtype: object, date is dtype: datetime64[ns], and season is dtype: object.

What I need is a bit complicated. I need a function or algorithm that recognizes df.date as the "end date" and df1.date as the "start date" for certain seasons. In other words, an event for "Apple" in df started on 2016-04-24 which is indicated by the binary=1. In a new dataframe, I need to include the weeks before 2016-04-24 but after 2016-03-28 and add a column that says binary=0 because the event had not occurred yet.

The messy thing is that there a four seasons: MAM 2016, OND 2016, MAM 2017, and OND 2017. And the "start date" for each df.date should only be considered for that season.

As an example, this is what I'm looking to get:

df_result

Name    date          season  binary
Apple   2016-03-28    MAM     0
Apple   2016-04-04    MAM     0
Apple   2016-04-11    MAM     0
Apple   2016-04-18    MAM     0
Apple   2016-04-24    MAM     1
Banana  2016-03-28    MAM     0
Banana  2016-04-04    MAM     0
Banana  2016-04-11    MAM     0
Banana  2016-04-18    MAM     0
Banana  2016-04-25    MAM     0
Banana  2016-05-01    MAM     1
Berry   2016-03-28    MAM     0
Berry   2016-04-04    MAM     0
Berry   2016-04-11    MAM     0
Berry   2016-04-18    MAM     0
Berry   2016-04-25    MAM     0
Berry   2016-05-01    MAM     1
Berry   2017-10-02    OND     0
Berry   2017-10-09    OND     0
Berry   2017-10-16    OND     0
Berry   2017-10-23    OND     0
Berry   2017-10-30    OND     0
Berry   2017-11-05    OND     1
Orange  2016-10-03    OND     0
Orange  2016-10-10    OND     0
Orange  2016-10-17    OND     0
Orange  2016-10-24    OND     0
Orange  2016-10-31    OND     0
Orange  2016-11-07    OND     0
Orange  2016-11-14    OND     0
Orange  2016-11-21    OND     0
Orange  2016-11-27    OND     1
Kiwi    2017-03-13    MAM     0
Kiwi    2017-03-20    MAM     0
Kiwi    2017-03-27    MAM     1

To summarize the seasons in start_end_dates and make it a little clearer, these are the dates:

  • MAM 2016: 2016-03-28 to 2016-06-13
  • OND 2016: 2016-10-03 to 2016-11-21
  • MAM 2017: 2017-03-13 to 2017-09-25
  • OND 2017: 2017-10-02 to 2017-11-20

With some help, I know that this I should reindex df with something like df2 = df.reindex(start_end_dates) but this doesn't work out quite right.

Then I should groupby the names and blank fill with "0" the weeks before df.date. So maybe: df3 = df3.loc[df3.groupby(level=0).binary.bfill().dropna().index].fillna(0).reset_index()

I have seen this stack overflow question but it doesn't exactly answer my question: Reindex pandas DataFrame to fill missing dates

Help greatly appreciated.

JAG2024
  • 3,987
  • 7
  • 29
  • 58

1 Answers1

1

We using merge_asof

df['Year']=df['date'].dt.year
df1['Year']=df1['date'].dt.year
A=[]
for x in range(len(df)):
    Temp=pd.merge_asof(df1.sort_values('date'),df.sort_values('date').iloc[[x],:],on='date',by=['Year','season'],direction='forward',allow_exact_matches =False).dropna().drop('Week',1)
    Temp.binary=np.nan
    A.append(Temp)
A.append(df)
target=pd.concat(A).sort_values(['Name','date'])


target
Out[262]: 
      Name  Year  binary       date season
0    Apple  2016     NaN 2016-03-28    MAM
1    Apple  2016     NaN 2016-04-04    MAM
2    Apple  2016     NaN 2016-04-11    MAM
3    Apple  2016     NaN 2016-04-18    MAM
0    Apple  2016     1.0 2016-04-24    MAM
0   Banana  2016     NaN 2016-03-28    MAM
1   Banana  2016     NaN 2016-04-04    MAM
2   Banana  2016     NaN 2016-04-11    MAM
3   Banana  2016     NaN 2016-04-18    MAM
4   Banana  2016     NaN 2016-04-25    MAM
1   Banana  2016     1.0 2016-05-01    MAM
0    Berry  2016     NaN 2016-03-28    MAM
1    Berry  2016     NaN 2016-04-04    MAM
2    Berry  2016     NaN 2016-04-11    MAM
3    Berry  2016     NaN 2016-04-18    MAM
4    Berry  2016     NaN 2016-04-25    MAM
2    Berry  2016     1.0 2016-05-01    MAM
47   Berry  2017     NaN 2017-10-02    OND
48   Berry  2017     NaN 2017-10-09    OND
49   Berry  2017     NaN 2017-10-16    OND
50   Berry  2017     NaN 2017-10-23    OND
51   Berry  2017     NaN 2017-10-30    OND
3    Berry  2017     1.0 2017-11-05    OND
20    Kiwi  2017     NaN 2017-03-13    MAM
21    Kiwi  2017     NaN 2017-03-20    MAM
5     Kiwi  2017     1.0 2017-03-27    MAM
12  Orange  2016     NaN 2016-10-03    OND
13  Orange  2016     NaN 2016-10-10    OND
14  Orange  2016     NaN 2016-10-17    OND
15  Orange  2016     NaN 2016-10-24    OND
16  Orange  2016     NaN 2016-10-31    OND
17  Orange  2016     NaN 2016-11-07    OND
18  Orange  2016     NaN 2016-11-14    OND
19  Orange  2016     NaN 2016-11-21    OND
4   Orange  2016     1.0 2016-11-27    OND
BENY
  • 317,841
  • 20
  • 164
  • 234