I am working on an Address parsing project where, I need to detect various components of the address, such as city, state, postal_code, street_no etc.
I wrote a regular expression to filter out the postal codes handling all user inputs.
sample_add = "16th main road btm layout 560029 5-6-00-76 56 00 78 560-029 25 -000-1"
regexp = re.compile(r"([\d])[ -]*?([\d])[ -]*?([\d])[ -]*?([\d])[ -]*?([\d])[ -]*?([\d])")
print(re.findall(regexp, sample_add))
Output :- [560029, 560076, 560078, 560029, 250001]
This is able to identify postal_codes for such addresses, However, when an address like the following comes, it combines the Street nos and interprets it as the postal code,
Ex. `sample_add_2 = "House no 323/46 16th main road, btm layout, bengaluru 560029"
In this case, the postal code is identified as 323461, while the correct one should have been 560029.