I'm still fairly new to pandas and the script i wrote to accomplish a seemily easy task seems needlessly complicated. If you guys know of an easier way to accomplish this I would be extremely grateful.
task: I hate two spreadsheets (df1&df2), each with an identifier (mrn) and a date. my task is to retrieve an value from df2 for each row in df1 if the following conditions are met:
the identifier for a given row in df1 exists in df2
if above is true, then retrieve the value in df2 if the associated date is within a +/-5 day range from the date in df1.
I have written the following code which accomplishes this:
#%%housekeeping
import numpy as np
import pandas as pd
import csv
import datetime
from datetime import datetime, timedelta
import sys
from io import StringIO
#%%dataframe import
df1=',mrn,date,foo\n0,1,2015-03-06,n/a\n1,11,2009-08-14,n/a\n2,14,2009-05-18,n/a\n3,20,2010-06-19,n/a\n'
df2=',mrn,collection Date,Report\n0,1,2015-03-06,report to import1\n1,11,2009-08-12,report to import11\n2,14,2009-05-21,report to import14\n3,20,2010-06-25,report to import20\n'
df1 = pd.read_csv(StringIO(df1))
df2 = pd.read_csv(StringIO(df2))
#converting to date-time format
df1['date']=pd.to_datetime(df1['date'])
df2['collection Date']=pd.to_datetime(df2['collection Date'])
#%%mask()
def mask(df2, rangeTime):
mask= (df2> rangeTime -timedelta(days=5)) & (df2 <= rangeTime + timedelta(days=5))
return mask
#%% detailLoop()
i=0
for element in df1["mrn"]:
df1DateIter = df1.ix[i, 'date']
df2MRNmatch= df2.loc[df2['mrn']==element, ['collection Date', 'Report']]
df2Date= df2MRNmatch['collection Date']
df2Report= df2MRNmatch['Report']
maskOut= mask(df2Date, df1DateIter)
dateBoolean= maskOut.iloc[0]
if dateBoolean==True:
df1.ix[i, 'foo'] = df2Report.iloc[0]
i+=1
#: once the script has been run the df1 looks like:
Out[824]:
mrn date foo
0 1 2015-03-06 report to import1
1 11 2009-08-14 report to import11
2 14 2009-05-18 report to import14
3 20 2010-06-19 NaN