0

I have a dataframe containing user-submitted postcodes, many of which aren't in the desired format I need to look them up with the Google Maps Geocoder API to get associated co-ordinates.

I have thus attempted to format it to return them in the format like 'IG1 2BF', 'E6 2QA', 'RH10 4DG'.

This works but is slow and I imagine there is a more 'Pythonic' way to write this. Any suggestions?

df['postcode'] = df['postcode'].str.replace(" ", "").str.upper()
for i in range(0, df['postcode'].size):
    if len(df['postcode'].iloc[i]) == 5:
        df['postcode'].iloc[i] = df['postcode'].iloc[i][:2] + " " + df['postcode'].iloc[i][2:]
    if len(df['postcode'].iloc[i]) == 6:
        df['postcode'].iloc[i] = df['postcode'].iloc[i][:3] + " " + df['postcode'].iloc[i][3:]
    if len(df['postcode'].iloc[i]) == 7:
        df['postcode'].iloc[i] = df['postcode'].iloc[i][:4] + " " + df['postcode'].iloc[i][4:]

Some sample data is provided of what is fed into the for loop:

1    E176PA
2    S8 0ZW
3    DT29BU
4    S44 5TE
5    HP17 9TN
6    N12 0QF
7    S25 1YT
8    OX13 6AP

Only rows 1 and 3 are in an undesired format.

user3058703
  • 571
  • 1
  • 8
  • 22

1 Answers1

2

Not sure about this being "pythonic", but seeing as the second block of UK postcodes is always made up of 3 characters, you can just slice the string using that fact:

def format_postcode(postcode):
    postcode = postcode.replace(" ", "").upper()
    return "{} {}".format(postcode[:-3], postcode[-3:])

Here, postcode[:-3] goes from the first to the 4th to last character, and postcode[-3:] goes from the 3rd to last to the last character.

You can then apply the function to the column of the DataFrame:

df['postcode'].apply(format_postcode)
miterhen
  • 175
  • 1
  • 11