How can I use a regex function to extract european streetnames, housing numbers and adjectives with pandas?

Question

I have a dataframe with 6000 records and need to extract/split the column with streetname into: "Streetname", "Housingnumber" and "Adjectives". Unfortunately, the problem is not solved yet using regex functions because there is no structure in the notation of df["streetname"]:

**Input from df["Streetname"]**

St. edward's Lane 26

Vineyardlane3a

High Street 0-9

ParkRoad near #33

Queens Road ??

s-Georgelane9abc

Kings Road 9b

1st Park Avenue 67 near cyclelane

**Output that I would like:

df["Street"]                    df["housingnumber"]             df["adjective"]**

St. Edward's lane               26

Vineyardlane                    3                               a

High Street                     0-9

ParkRoad                        33

Queens Road                    

s-Georgelane                    9                               abc

Kings Road                      9                               b 

1st Park Avenue                 67

I tried this:

Filter = r'(?P<S>.*)(?P<H>\s[0-9].*)'

df["Streetname"] = df["Streetname"].str.extract(Filter)

I lose a lot of data and the result is only written into one column... Hope that someone can help!

Jan · Answer 1 · 2018-05-31T18:12:38.667

0

Not 100% perfect (I doubt that this will be possible without a database or machine learning algorithms) but a starting point:

^                         # start of line/string
(?P<street>\w+?\D+)       # [a-zA-Z0-9_]+? followed by not a number
(?P<nr>\d*[-\d]*)         # a digit, followed by - and other digits, eventually
(?P<adjective>[a-zA-Z]*)  # a-z
.*                        # consume the rest of the string

See a demo on regex101.com.

You might want to strip of #, whitespaces or ? from the end of street afterwards.

edited May 31 '18 at 18:12

answered May 31 '18 at 11:12

Jan

42,290
8
54
79

Almost perfect! I don't understand yet why it doesn't filter out the 3a at vineyard. I'm going to tinker with it some more. – Levi May 31 '18 at 11:24
@Levi: My bad, changed the expression. – Jan May 31 '18 at 18:12

How can I use a regex function to extract european streetnames, housing numbers and adjectives with pandas?

1 Answers1