Python, Pandas - Issue applying function to a column in a dataframe to replace only certain items

Question

I have a dictionary of abbreviations of some city names that our system (for some reason) applies to data (i.e. 'Kansas City' is abbreviated 'Kansas CY', and Oklahoma City is spelled correctly).

I am having an issue getting my function to apply to the column of the dataframe but it works when I pass in strings of data. Code sample below:

def multiple_replace(text, dict):
  # Create a regular expression  from the dictionary keys
  regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

  # For each match, look-up corresponding value in dictionary
  return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

testDict = {"Kansas CY": "Kansas City"}

dfData['PREV_CITY'] = dfData['PREV_CITY'].apply(multiple_replace, dict=testDict)

When I add 'axis=1' into that last line it errors out saying I provided too many args. Otherwise, it runs without error it just doesn't make the changes when there is a match to the dictionary.

Thank you in advance! -Reece

if you you already have a dict then `dfData['PREV_CITY'] = dfData['PREV_CITY'].map(testDict, na_action='ignore')` should work — EdChum, Sep 20 '16 at 19:51
that means you don't have exact matches against your dict keys, you've not posted any sample data so this is just a guess — EdChum, Sep 20 '16 at 20:01
Well, Ill be darned... it was case sensitive... I feel like a fool. Thank You!! — ProfessionalAmateur, Sep 20 '16 at 20:04

score 2 · Accepted Answer · answered Sep 20 '16 at 20:07

You can use map and pass a dict to replace exact matches against the dict keys with the dict values, as you may have case-sensitive matches I'd lower all the strings first prior to the match:

dfData['PREV_CITY'] = dfData['PREV_CITY'].str.lower().map(testDict, na_action='ignore')

this assumes the keys in your dict are also lower case

Python, Pandas - Issue applying function to a column in a dataframe to replace only certain items

1 Answers1