2

I'm working on a python script to convert full uppercase addresses to Title Case. The issue I'm facing is that when I apply .title() to a string like SOUTH 16TH STREET, I get South 16Th Street. The desired conversion would be South 16th Street, where the abbreviation to the ordinal is lowercase.

What is a simple way in python to accomplish this? I was thinking about using some kind of regex.

Elliott
  • 360
  • 1
  • 4
  • 15

4 Answers4

5

To solve your stated problem narrowly, I think you may find string.capwords() useful. It encapsulates the split -> capitalize -> join sequence into a single command.

>>> address = "SOUTH 16TH STREET"
>>> capwords(address)
'South 16th Street'

See more info on that command in Python 3.4 at...

https://docs.python.org/3.4/library/string.html#string-functions

It also exists in earlier versions of Python.

However, broadening your question to address formatting generally, you may run into trouble with this simplistic approach. More complex (e.g. regex-based) approaches may be required. Using an example from my locale:

>>> address = "Highway 99N"  # Wanting'Highway 99N'
>>> capwords(address)
'Hwy 99n'

Address parsing (and formatting) is a wicked problem due to the amount of variation in legitimate addresses as well as the different ways people will write them (abbreviations, etc.).

The pyparsing module might also be a way to go if you don't like the regex approach.

Nick Seigal
  • 51
  • 1
  • 2
3

It might be easiest to split the string into a list of separate words, capitalize each word and then join them back together:

>>> address = "SOUTH 16TH STREET"
>>> " ".join([word.capitalize() for word in address.split()])
'South 16th Street'

The capitalize() method sets the first character of a string to uppercase and the proceeding characters to lowercase. Since numbers don't have upper/lowercase forms, "16TH" and similar tokens are transformed as required.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
2

Use this Regex-based solution:

import re
convert = lambda s: " ".join([x.lower() if re.match("^\d+(ST|ND|RD|TH)$", x) is not None else x.title() for x in s.split()])

Basically, I split the string and see for each word if it is an ordinal, then apply the appropriate action.

pascalhein
  • 5,700
  • 4
  • 31
  • 44
  • []'s inside the argument to `" ".join` are not necessary. They create a temporary list comprehension which is then passed to `join`. `join` is perfectly capable of handling a generator expression, which is what you get if you leave out the []'s. Not much of an actual perf gain in this case, but a good habit to get into - []'s in this kind of expression haven't been required since Python 2.3 or 2.4 - well, it's been a while. – PaulMcG Jan 28 '15 at 07:01
1
>>> str_='SOUTH 16TH STREET'
>>> ' '.join([i.title() if i.isalpha() else i.lower() for i in str_.split()])
'South 16th Street'
Irshad Bhat
  • 8,479
  • 1
  • 26
  • 36