Text to columns in pandas dataframe

Question

I have a pandas dataset like below:

import pandas as pd

data = {'id':  ['001', '002', '003'],
        'address': ["William J. Clare\n290 Valley Dr.\nCasper, WY 82604\nUSA, United States",
                    "1180 Shelard Tower\nMinneapolis, MN 55426\nUSA, United States",
                    "William N. Barnard\n145 S. Durbin\nCasper, WY 82601\nUSA, United States"]
        }

df = pd.DataFrame(data)

print(df)

I need to convert address column to text delimited by \n and create new columns like name, address line 1, City, State, Zipcode, Country like below:

id  Name   addressline1 City    State   Zipcode Country
1   William J. Clare    290 Valley Dr.  Casper  WY  82604   United States
2   null    1180 Shelard Tower  Minneapolis MN  55426   United States
3   William N. Barnard  145 S. Durbin   Casper  WY  82601   United States

I am learning python and from morning I am solving this. Any help will be greatly appreciated.

Thanks,

```df = df.join(df['address'].str.split('\n', expand=True).add_prefix('new_column')).drop(['address'], axis=1)``` — Sushmitha, Oct 18 '22 at 11:07
Tried the above command but still not able to read null if the name is not available. Schema is all over the place — Sushmitha, Oct 18 '22 at 11:07
df = pd.DataFrame(data) Address = df['address'].str.split('\n',expand = True) print(Address) — Naina, Oct 18 '22 at 11:15
Hi Naina, thank you for your response. But I am not able to extrapolate the right values into right columns. Thanks — Sushmitha, Oct 18 '22 at 11:19
this is because the newline character \n is not marked at the right place in your original dataframe — Naina, Oct 18 '22 at 11:20
No, its beacause id no 2 has no name. if there is no name it should read the first column as Null or None — Sushmitha, Oct 18 '22 at 11:23
There is pattern firstt name, addressline, city state, zipcode and country. If the name is not available it should be None. — Sushmitha, Oct 18 '22 at 11:24

Michał Szajer · Accepted Answer · 2022-10-18T16:19:47.540

Right now, Pandas is returning you the table with 2 columns. If you look at the value in the second column, the essential information is separated with the comma. Therefore, if you saved your dataframe to df you can do the following:

df['address_and_city'] = df['address'].apply(lambda x: x.split(',')[0])
df['state_and_postal'] = df['address'].apply(lambda x: x.split(',')[1])
df['country'] = df['address'].apply(lambda x: x.split(',')[2])

Now, you have additional three columns in your dataframe, the last one contains the full information about the country already. Now from the first two columns that you have created you can extract the info you need in a similar way.

df['address_first_line'] = df['address_and_city'].apply(lambda x: ' '.join(x.split('\n')[:-1]))
df['city'] = df['address_and_city'].apply(lambda x: x.split('\n')[-1])
df['state'] = df['state_and_postal'].apply(lambda x: x.split(' ')[1])
df['postal'] = df['state_and_postal'].apply(lambda x: x.split(' ')[2].split('\n')[0])

Now you should have all the columns you need. You can remove the excess columns with:

df.drop(columns=['address','address_and_city','state_and_postal'], inplace=True)

Of course, it all can be done faster and with fewer lines of code, but I think it is the clearest way of doing it, which I hope you will find useful. If you don't understand what I did there, check the documentation for split and join methods, and also for apply method, native to pandas.

Hi Michal, will check and let you know. Thanks for your quick response — Sushmitha, Oct 18 '22 at 12:05
Hi @Michal I am getting this below error message. AttributeError: 'builtin_function_or_method' object has no attribute 'x' — Sushmitha, Oct 18 '22 at 12:31
of course, I did a typo there... sorry it should be df['address_first_line'] = df['address_and_city'].apply(lambda x: ' '.join(x.split('\n')[:-1])) I have corrected my response — Michał Szajer, Oct 18 '22 at 13:05
Hi @Sushmitha, You were right. I have edited the code, now it should execute properly. I hope it will help. — Michał Szajer, Oct 18 '22 at 16:20

Text to columns in pandas dataframe

1 Answers1