0

I am trying to make a dynamic fuzzy logic join for 2 tables. What I mean by dynamic is allowing the arguments to specify the variables that will allow the two tables to join. The code noted below is a modified version of the static code under the following link: Python Pandas fuzzy merge/match with duplicates

I have compiled the dynamic code below:

import pandas as pd
import datetime
from fuzzywuzzy import fuzz
import difflib 

donors = pd.DataFrame({"name": pd.Series(["John Doe","John Doe","Tom Smith","Jane Doe","Jane Doe","Kat test"]), "Email": pd.Series(['a@a.ca','a@a.ca','b@b.ca','c@c.ca','something@a.ca','d@d.ca']),"Date": (["27/03/2013  10:00:00 AM","1/03/2013  10:39:00 AM","2/03/2013  10:39:00 AM","3/03/2013  10:39:00 AM","4/03/2013  10:39:00 AM","27/03/2013  10:39:00 AM"])})
fundraisers = pd.DataFrame({"name": pd.Series(["John Doe","John Doe","Kathy test","Tes Ester", "Jane Doe"]),"Email": pd.Series(['a@a.ca','a@a.ca','d@d.ca','asdf@asdf.ca','something@a.ca']),"Date": pd.Series(["2/03/2013  10:39:00 AM","27/03/2013  11:39:00 AM","3/03/2013  10:39:00 AM","4/03/2013  10:40:00 AM","27/03/2013  10:39:00 AM"])})
donors["Date"] = pd.to_datetime(donors["Date"], dayfirst=True)
fundraisers["Date"] = pd.to_datetime(donors["Date"], dayfirst=True)
donors["code"] = donors.apply(lambda row: str(row['name'])+' '+str(row['Email']), axis=1)
idx = donors.groupby('code')["Date"].transform(min) == donors['Date']
donors = donors[idx].reset_index().drop('index',1)

def get_donors_v1(fund_var,don_var, don_tab,row=None):
    d = don_tab.apply(lambda x: fuzz.ratio(x["%s" % don_var], 'row["%s" %fund_var]') * 2, axis=1)
    d = d[d >= 75]
    if len(d) == 0:
        v = ['']*3
    else:
        v = don_tab.ix[d.idxmax(), ["%s"% don_var ,'Email','Date']].values
    return pd.Series(v, index=['donor name', 'donor email', 'donor date'])

trial=pd.concat((fundraisers, fundraisers.apply(get_donors_v1(fund_var="name",don_var="name",don_tab=donors), axis=1)), axis=1)

I get the following error:

TypeError: get_donors_v1() takes exactly 4 arguments (3 given)

Should I replace the function to:

get_donors_v1(row=None,fund_var,don_var, don_tab)

then i get the following error:

TypeError: ("'NoneType' object has no attribute 'getitem'", u'occurred at index 0')

please help.

Community
  • 1
  • 1
Seb_aj
  • 435
  • 6
  • 14

1 Answers1

2

In your code example, you supply get_donors() with the value None for the argument 'row'. In the next line, you're trying to use row as a map (row["%s" %fund_var]) without testing whether the object exists, that is: not equals None.

Indexing an object like 'row["%s" %fund_var]' causes the getitem method to be called, that None does not have indeed.

user508402
  • 496
  • 1
  • 4
  • 19
  • Hello user508402, I amended the post. I tried your solution but get this TypeError: ("'Series' object is not callable", u'occurred at index 0'). I am in the process of learning Python - I apologise if my queries seem pedantic. – Seb_aj Dec 29 '16 at 16:36
  • Hi Seb. The 'xxxx is not callable' error means that an object is called like a function, so its identifier is followed by an opening parenthesis. For example: mySeries(). Find the line where your error is raised and in that line the series object that is being called. Note it should not be like "pd.Series(...)", for that is a legitimate constructor. – user508402 Dec 29 '16 at 20:18