2

I want to replace all occurrences of a set of strings in a text line. I came up with this approach, but I am sure there is a better way of doing this:

myDict = {}
test = re.compile(re.escape('pig'), re.IGNORECASE)
myDict['car'] = test
test = re.compile(re.escape('horse'), re.IGNORECASE)
myDict['airplane'] = test
test = re.compile(re.escape('cow'), re.IGNORECASE)
myDict['bus'] = test

mystring = 'I have this Pig and that pig with a hOrse and coW'

for key in myDict:      
    regex_obj = myDict[key]
    mystring = regex_obj.sub(key, mystring)

print mystring

I have this car and that car with a airplane and bus

Based on @Paul Rooney's answer below, ideally I would do this:

def init_regex():
    rd = {'pig': 'car', 'horse':'airplane', 'cow':'bus'}
    myDict = {}
    for key,value in rd.iteritems():
        pattern = re.compile(re.escape(key), re.IGNORECASE)
        myDict[value] = pattern

    return myDict

def strrep(mystring, patternDict):
    for key in patternDict:
        regex_obj = patternDict[key]
        mystring = regex_obj.sub(key, mystring)

    return mystring
Amro Younes
  • 1,261
  • 2
  • 16
  • 34
  • 1
    I am not sure there is anything significantly better than this. See http://stackoverflow.com/questions/919056/python-case-insensitive-replace – nullstellensatz Mar 25 '15 at 00:08

1 Answers1

4

Try

import itertools
import re

mystring = 'I have this Pig and that pig with a hOrse and coW'

rd = {'pig': 'car', 'horse':'airplane', 'cow':'bus'}

cachedict = {}

def strrep(orig, repdict):
    for k,v in repdict.iteritems():
        if k in cachedict:
            pattern = cachedict[k]
        else:
            pattern = re.compile(k, re.IGNORECASE)
            cachedict[k] = pattern
        orig = pattern.sub(v, orig)
    return orig

print strrep(mystring, rd)

This answer was initially written for python2, but for python 3 you would use repdict.items instead of repdict.iteritems.

Paul Rooney
  • 20,879
  • 9
  • 40
  • 61
  • Is it possible to cache the compile to save on computation as my strrep would be called multiple times? – Amro Younes Mar 25 '15 at 02:42
  • Based on your answer I improved the solution to cache the compiled portion of the code into an array so I don't have to do this everytime and then just call the patter with .sub – Amro Younes Mar 25 '15 at 02:55
  • 1
    Yes that works. I was looking at a solution to cache the compiled regex's into a dict. – Paul Rooney Mar 25 '15 at 02:57
  • The `dict` caching solution seems to actually be a bit slower than without the caching. – Paul Rooney Mar 25 '15 at 03:04
  • The reason your approach to a dictionary degrades performance is probably because you specify a default if the key is not found. I would venture to guess that python performs the compile before calling the get which causes the additional compute unnecessarily every time. – Amro Younes Mar 25 '15 at 13:56
  • AttributeError: 'dict' object has no attribute 'iteritems' – johnrao07 May 01 '20 at 08:33
  • Yep it’s a python 2 only function, use ‘items’ instead of ‘iteritems’. As the question is not tagged python 2 I will update. – Paul Rooney May 01 '20 at 08:36