I am looking for algorithm or example code for parsing the test postal address from a text file and convert it to excel report.
The text file I receive will have many postal address's in different formats as give below:
Unit 8/25 Bright ST, NSW, 2010
UNIT 5, 77 CHAPMAL STREET
UNIT 4, 75 GREAT STREET
95 OAKILANDS WAY
AVOCADO AVE
628 BRIDGEWATER ABCE ROAD
I have to read this file and assign the details to variables for further usage:
example -
street number from text address should be asigned to 'streetName',
unit number should be assigned to 'unitNumber', etc
I have pattern matcher which can recognise the values if there are specific details in the string:
Pattern p1 = Pattern.compile("([0-9]+\\s+[aA-zZ])+.*");
Matcher m1 = p1.matcher(str);
while (m1.find()) {
String __tmp = m1.group();
printer = __tmp;
}
System.out.println("Street : " + printer);;
ex: - 4, 75 GREAT STREET : from this text address, the above algorithm is able to identify "75 GREAT" as a street, but the number "4" before the street address should be identified as unit/flat/etc irrespective of weather "4" is joined with "unit"/"flat",etc.
I have added one more pattern match to get the numeric before the street address like :
Pattern p2 = Pattern.compile("([0-9])");
Matcher m2 = p2.matcher(str);
Pattern p3 = Pattern.compile("([,])");
Matcher m3 = p3.matcher(str);
while (m2.find() & m3.find()) {
String __tmp = m2.group();
printer = __tmp;
}
System.out.println("Unit : " + printer);
This gives me output as:
Street : 75 GREAT STREET
Unit : 4
but, this algorithm is not working when I append with "unit" or "flat" etc. Can someone help with the solution for this ?