1

I have a dictionary of abbreviations of some city names that our system (for some reason) applies to data (i.e. 'Kansas City' is abbreviated 'Kansas CY', and Oklahoma City is spelled correctly).

I am having an issue getting my function to apply to the column of the dataframe but it works when I pass in strings of data. Code sample below:

def multiple_replace(text, dict):
  # Create a regular expression  from the dictionary keys
  regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

  # For each match, look-up corresponding value in dictionary
  return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

testDict = {"Kansas CY": "Kansas City"}

dfData['PREV_CITY'] = dfData['PREV_CITY'].apply(multiple_replace, dict=testDict)

When I add 'axis=1' into that last line it errors out saying I provided too many args. Otherwise, it runs without error it just doesn't make the changes when there is a match to the dictionary.

Thank you in advance! -Reece

1 Answers1

2

You can use map and pass a dict to replace exact matches against the dict keys with the dict values, as you may have case-sensitive matches I'd lower all the strings first prior to the match:

dfData['PREV_CITY'] = dfData['PREV_CITY'].str.lower().map(testDict, na_action='ignore')

this assumes the keys in your dict are also lower case

EdChum
  • 376,765
  • 198
  • 813
  • 562