0

I have several set of strings with numbers followed words and jumbled numbers and words etc. For example,

"Street 50 No 40", "5, saint bakers holy street", "32 Syndicate street"

I am trying to separate the street names from the apartment numbers.

Here is my current code:

import re 

pattern_street = re.compile(r'[A-Za-z]+\s?\w+\s?[A-Za-z]+\s?[A-Za-z]+',re.X)
pattern_apartmentnumber = re.compile(r'(^\d+\s? | [A-Za-z]+[\s?]+[0-9]+$)',re.X)

for i in ["Street 50 No 40", "5, saint bakers holy street", "32 Syndicate street"]:
    
    match_street = pattern_street.search(i) 
    match_apartmentnumber = pattern_apartmentnumber.search(i)

    fin_street = match_street[0]
    fin_apartmentnumber = match_apartmentnumber[0]

    print("street--",fin_street)
    print("apartmentnumber--",fin_apartmentnumber)

which prints:

street-- Street 50 No
apartmentnumber-- No 40
street-- saint bakers holy street
apartmentnumber-- 5
street-- Syndicate street
apartmentnumber-- 32

I want to remove the "No" from the first street name. i.e. if there is any street with No followed by a number at the end, that needs to be taken as the apartment number, and not as the street. How can I do this for my above example strings?

Srivatsan
  • 9,225
  • 13
  • 58
  • 83

2 Answers2

1

First try the case where there is a No 123 at the end, use a positive lookahead.

If not found try a street without this.

pattern_street = re.compile(r'[A-Za-z]+[\s\w]+(?=\s[Nn]o\s\d+$)|[A-Za-z]+[\s\w]+',re.X)
rioV8
  • 24,506
  • 3
  • 32
  • 49
0

You can find the street name by the following regex pattern to eliminate No [0-9] from the statement.

pattern_street = re.compile(r'[A-Za-z]+((?!No).)+',re.X)
Pooria_T
  • 136
  • 1
  • 7