-1

I have an address formatted like this:

street address, town zip

I need to add the state abbreviation before the zip, which is always 5 digits.

I think I should use regex to do something like below, but I don't know how to finish it:

instr = "123 street st, anytown 12345"
state = 'CA'
outstr = re.sub(r'(???)(/\b\d{5}\b/g)', r'\1state\2', instr)

My question is what to put in the ??? and whether I used the state variable correctly in outstr. Also, did I get the zip regex correct?

Forge
  • 6,538
  • 6
  • 44
  • 64
pekasus
  • 626
  • 2
  • 7
  • 20

2 Answers2

2

You can also use rsplit to do that:

instr = "123 street st, anytown 12345"
state = 'CA'
address, zip_code = instr.rsplit(' ', 1)  # ['123 street st, anytown', '12345']
print '%s %s %s' % (address, state, zip_code)
>> "123 street st, anytown CA 12345"


From the str.rsplit documentation:

str.rsplit([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done, the rightmost ones.

Forge
  • 6,538
  • 6
  • 44
  • 64
  • 1
    That's why I love Python. I had to add some more string handling, but it works great. Thanks! – pekasus Mar 04 '16 at 12:40
  • I will add another str.rstrip() to remove the trailing spaces, just in case the data contains trailing spaces, e.g. "123 street st, anytown 12345 " , above code will failed. – mootmoot Mar 08 '16 at 09:18
1
  1. You can't put the variable "state" straight into the replacement string. You should use python string formatting to make reference to the variable.
  2. Keep regex simple, assume the data are simple. If ZIP is always appear the the end of the string, then just match from the end, use $.

Let me try :

instr = "123 street st, anytown 12345"
# Always strip the trailing spaces to avoid surprises
instr = instr.rstrip()
state = 'CA'
# Assume The ZIP has no trailing space and in last position.     
search_pattern = r"(\d{5})$"
#
# Format the replacement, since I search from the end, so group 1 should be fined 
replace_str = r"{mystate} \g<1>'.format(mystate = state)        
outstr = re.sub(search_pattern, replace_str, instr)

@Forge example is lean and clean. However, you need to be careful about the data quality when using str.rsplit(). For example

# If town and zip code stick together
instr = "123 street st, anytown12345"
# or trailing spaces
instr = "123 street st, anytown 12345  "

The universal fix is use a strip and regex as shown in my code. Always think ahead of input data quality, some code will failed after going through unit test.

mootmoot
  • 12,845
  • 5
  • 47
  • 44