-2

I have a column of Canadian postal codes A1A 1A1 (A meaning a-z and 1 meaning 0-9) but with different formats.

A1A1A1 (no space) A1A 1A1 (with space)

Also, the capitalization of the letters is all over the place ( Example: M1a 5w7 or m1a 5W7). What code can I used to loop through this column and standardize all values to A1A 1A1 ( All capitals and a space)

Thank you!

  • Please some more details – Sabil Aug 30 '21 at 21:30
  • 1
    What have you tried? Why didn't it work? – Macattack Aug 30 '21 at 21:30
  • I am a new coder and I researched this platform for a while but haven't found a solution. @Sabil what further details can I provide you with? – Carolina Medina Aug 30 '21 at 21:38
  • Welcome to Stack Overflow - and to programming. When you write code, the goal is to *solve problems* by *breaking them down into smaller pieces*. For example: are you able to write code that converts the string to uppercase? (Hint: what happens if you try putting `python uppercase string` into a search engine?) Are you able to write code that removes all the whitespace from a string? Are you able to write code that takes a string without whitespace, and puts a space after the third character? If you have all of those pieces, do you see how you could use them to solve the problem? – Karl Knechtel Aug 30 '21 at 21:47
  • I see, you want to apply those changes to a column in a Pandas DataFrame. Okay, so, basically the same questions apply, except now you're searching for e.g. `pandas uppercase column` instead. – Karl Knechtel Aug 30 '21 at 21:49
  • Does https://stackoverflow.com/questions/31269216/applying-uppercase-to-a-column-in-pandas-dataframe help? How about https://stackoverflow.com/questions/41476150/removing-space-from-columns-in-pandas ? How about https://stackoverflow.com/questions/36235497/pandas-add-space-between-characters-if-needed ? I got all of these by doing the kind of internet searches I described above. – Karl Knechtel Aug 30 '21 at 21:50
  • Please also read https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users to set your expectations about research, and also get some good hints on how to do it. – Karl Knechtel Aug 30 '21 at 21:51
  • Thank you for that comment @KarlKnechtel , that is a great insight into the thought process for research. – Carolina Medina Aug 31 '21 at 00:13
  • I have taken into consideration your comment @KarlKnechtel and I'm breaking down the code per step. Right now, I am trying to change all the letters to upper case, but interesting enough it only read the last 3 digits of the postal code. I used the following code: df['postal_code']=df['postal_code'].str.upper(). Any though? – Carolina Medina Aug 31 '21 at 14:13
  • Please provide enough code so others can better understand or reproduce the problem. – Community Sep 02 '21 at 06:15

3 Answers3

0
import pandas as pd

# First let's create a df which we want to correct:
df = pd.DataFrame(
    {
        "codes": [
            "A1A1A1",
            "A1A 1A1",
            "A1A1a1",
            "a1A 1A1",

        ]
    }
)

# Now let's correct the postal code:

for current_index in df.index:  # With this syntax you loop over each row
    entry = df.loc[current_index, "codes"]  # With .loc you can select a desired index and column (use the current_index from the loop)
    entry = entry.upper()  # Make it upper case
    entry = entry.replace(" ", "")  # Remove all the spaces
    corrected_entry = entry[:3] + " " + entry[3:]  # Now u assemble the new entry with "string slicing"
    df.loc[current_index, "codes"] = corrected_entry  # assign the corrected postal code to the data frame

result:

print(df)
     codes
0  A1A 1A1
1  A1A 1A1
2  A1A 1A1
3  A1A 1A1
lorenz-ry
  • 64
  • 3
0

if you prefer pure python,

codes = [
    "A1A1A1",
    "A1A 1A1",
    "A1A1a1",
    "a1A 1A1",
]

new_codes = []

for code in codes:
    if code[3] != ' ':
        code = f"{code[:3]} {code[3:]}"
    new_codes.append(code.upper())

print(new_codes)
cmyui
  • 81
  • 4
0

Hello and thank you to everyone that took the time to answer. I have come to a solution.

df['postal_code']=df['postal_code'].astype("string")
df['postal_code']=df['postal_code'].str.upper()
df['postal_code']=df['postal_code'].str.replace(" ","")
df.head()