I am attempting to expand the records in a data frame between two dates. Given the input file of single entry for each record, I want to expand it based on a given date.
Here is an example of the input:
Here is an example of the desired expanded output:
Based on some other examples and documentation online, what I attempted to do was expand out the data frame on a 6 month time frame to get two records for each year, then I corrected the dates based on the birthday of the records using a counter to determine the split for before and after birthday.
df_expand['DATE'] = [pd.date_range(s, e, freq='6M') for s, e in
zip(pd.to_datetime(df_expand['Exposure Start']),
pd.to_datetime(df_expand['Exposure Stop']))]
df_expand = df_expand.explode('DATE').drop(['Exposure Start', 'Exposure Stop'], axis=1)
df_merged['counter'] = range(len(df_merged))
df_merged['start end'] = np.where(df_merged['counter'] % 2 != 0, 1, 0)
df_merged['DoB Year'] = df_merged['DoB'].dt.year
df_merged['DoB Month'] = df_merged['DoB'].dt.month
df_merged['DoB Day'] = df_merged['DoB'].dt.day
df_merged.loc[df_merged['start end'] == 0, 'Exposure Start'] = '1/1/'+ df_merged['Calendar Year'].astype(str)
df_merged.loc[df_merged['start end'] == 1, 'Exposure Start'] = df_merged['DoB Month'].astype(str) + '/' + (df_merged['DoB Day'].astype(int)+1).astype(str) + '/' + df_merged['Calendar Year'].astype(str)
df_merged.loc[df_merged['start end'] == 0, 'Exposure Stop'] = df_merged['DoB Month'].astype(str) + '/' + df_merged['DoB Day'].astype(str) + '/' + df_merged['Calendar Year'].astype(str)
df_merged.loc[df_merged['start end'] == 1, 'Exposure Stop'] = '12/31/'+ df_merged['Calendar Year'].astype(str)
This solution is clearly not elegant, and while it worked originally for my proof of concept, it is now running into issues with edge cases involving rules for the Exposure Start.
Study years are split into 2 separate periods, around the record's birthday.
The initial exposure begins 1/1 of the study year (or, the date that the record enters the study, whichever comes later) and goes through the day before the birthday (or non-death exit date, if that comes sooner).
The 2nd period goes from the birthday to the end of the calendar year (or non-death exit date, if that comes sooner). Where a death is observed, exposure is continued through the next birthday.
An iterative solution is probably better suited, but this was the documentation and guidance I received.