0

I'm still fairly new to pandas and the script i wrote to accomplish a seemily easy task seems needlessly complicated. If you guys know of an easier way to accomplish this I would be extremely grateful.

task: I hate two spreadsheets (df1&df2), each with an identifier (mrn) and a date. my task is to retrieve an value from df2 for each row in df1 if the following conditions are met:

the identifier for a given row in df1 exists in df2

if above is true, then retrieve the value in df2 if the associated date is within a +/-5 day range from the date in df1.

I have written the following code which accomplishes this:

#%%housekeeping
import numpy as np
import pandas as pd
import csv
import datetime
from datetime import datetime, timedelta 
import sys
from io import StringIO

#%%dataframe import
df1=',mrn,date,foo\n0,1,2015-03-06,n/a\n1,11,2009-08-14,n/a\n2,14,2009-05-18,n/a\n3,20,2010-06-19,n/a\n'
df2=',mrn,collection Date,Report\n0,1,2015-03-06,report to import1\n1,11,2009-08-12,report to import11\n2,14,2009-05-21,report to import14\n3,20,2010-06-25,report to import20\n'

df1 = pd.read_csv(StringIO(df1))
df2 = pd.read_csv(StringIO(df2))


#converting to date-time format
df1['date']=pd.to_datetime(df1['date'])
df2['collection Date']=pd.to_datetime(df2['collection Date'])

#%%mask()   
def mask(df2, rangeTime):
    mask= (df2> rangeTime -timedelta(days=5)) & (df2 <= rangeTime + timedelta(days=5))
    return mask

#%% detailLoop()
i=0
for element in df1["mrn"]:
    df1DateIter = df1.ix[i, 'date']
    df2MRNmatch= df2.loc[df2['mrn']==element, ['collection Date', 'Report']]
    df2Date= df2MRNmatch['collection Date']
    df2Report= df2MRNmatch['Report']
    maskOut= mask(df2Date, df1DateIter)
    dateBoolean= maskOut.iloc[0]
    if dateBoolean==True: 
        df1.ix[i, 'foo'] = df2Report.iloc[0]      
    i+=1

#: once the script has been run the df1 looks like:

Out[824]: 

   mrn       date                 foo
0    1 2015-03-06   report to import1
1   11 2009-08-14  report to import11
2   14 2009-05-18  report to import14
3   20 2010-06-19                 NaN
  • 1
    Can you post sample data (input, expected result)? – Andrew L Jun 07 '17 at 23:23
  • updated with sample data and completed test cases. – Garrett Eickelberg Jun 07 '17 at 23:32
  • 1
    Your code is attempting to read two csv files, but your sample data is posted as formatted data frames. So which is it? Suggest you read how to [ask good questions](http://stackoverflow.com/help/how-to-ask), that includes creating a [Minimal, Complete, and Verifiable](http://stackoverflow.com/help/mcve) example. – Stephen Rauch Jun 08 '17 at 05:34
  • I modified my code so that you should be able to copy and paste it into a python environment and it should run. note that the script does write over the original dataframe, so if you run all the code continuously, df1 will reflect the data retrieval, rather than the initialized dataframe. i'm still pretty new to stack overflow so I really appreciate the constructive criticism and am open to hearing how i can better ask questions! – Garrett Eickelberg Jun 08 '17 at 18:52

0 Answers0