2

I'm using javascript to parse through some data and have run into a bit of a pickle.

I have a field that is 1-3 lines of data.
Usually it is only one line, representing a street address:

1234 Hollywood St.

But sometimes it is something like this:

Beverly Hills Shopping Center
1234 Hollywood St.

Other times it is this:

1234 Hollywood St
Ste 12

And other times its stuff like this:

1234 Hollywood St
2nd Floor
(between Hollywood St and Tom Cruise Ave)

I'd really like to know which line is the street address. Currently, I'm trying to identify which line is the "Address line 2", meaning the Suite#, Floor number, etc... I don't really need the address, line 2, but by process of elimination, this helps get me the street address.

Is there a nice tool available, like a regex function or something that will tell me if a string is likely a street address?

Or is there another way that I could be handling this?

Thanks!

Edit:

This algorithm does not need to be 100%. I'm preparing the address to be sent to google maps API to be verified. I could try each line of the address to see which one is valid but this would increase the number of calls to google and carry a small, but finite chance of a false positive.

I'd like to be able to scrub the data a little before verifying through google to decrease errors and the necessity for more calls.

Ahmet Emre Kilinc
  • 5,489
  • 12
  • 30
  • 42
Chris Dutrow
  • 48,402
  • 65
  • 188
  • 258
  • 4
    This doesn't look like a problem that a Regex can handle properly. What is your end goal with this, what do you need it for? (ie. do you have a pressing reason to separate these things?) – Pekka Dec 14 '11 at 20:38
  • This is too much for regular expressions. If you want a fully automated way to determine this, you need an AI algorithm and good seed data. – Esoteric Screen Name Dec 14 '11 at 20:46
  • The fundamental problem is "Address line 2". Address should be a block of text of n lines, with n - 1 carriage returns. Why do you need to separately store "Suite#"? Do you do a report ordered by suite #? – James Dec 14 '11 at 20:49

3 Answers3

2

As stated in another answer, this is a job for an address verification service. Please note that the Google maps API is not an address verification service--it would be best described as a very capable address approximation service (there's a notable difference).

Address verification implies that an address is real at the present time, meaning that it corresponds to an actual location. It often implies that an address is deliverable (depending on the business need).

I'm a software developer at SmartyStreets, an address verification company. We provide a batch processing tool that I think is a good fit for your use case. Since our system accepts up to two input lines for the streets address, I suggest generating a few permutations for each address that has more than 2 street address lines. It is also very fast (1 million addresses are processed in less than an hour) and doesn't require any interaction from us because it's an online service.

The other bit of good news is that you may not even need to send the address to the google maps API because they will already be Delivery-Point Validated. But that will depend on your exact needs.

Update: SmartyStreets now provides international address verification.

Michael Whatcott
  • 5,603
  • 6
  • 36
  • 50
  • Can I use this as a way of getting a "canonical" form of an address? Meaning it will normalize such textual inconsistencies as "Street" and "St."? – Chris Dutrow Feb 08 '12 at 16:55
  • 1
    Yes, because we are a CASS-Certified address verification provider with the USPS we are able to normalize those exact variations. This also provides the added benefit of being able to identify duplicate addresses within a list that is submitted. Please note that there are limitations to how much we can do to verify poorly formatted address data--at some point we would be making something out of nothing, causing the possible misdirection of mail or storage of incorrect data. – Michael Whatcott Feb 09 '12 at 17:46
1

There are webservices available that you can pass an address and it will return a well formed json/xml object of the parsed address. Perhaps something like that will help you? Like some of the comments state. You won't be able to do this simply with javascript

Here is one service I have personally looked into using. You'll need to get familiar with the APIs

https://webgis.usc.edu/Services/AddressNormalization/WebService/DeterministicNormalizationWebService.aspx

Doug Chamberlain
  • 11,192
  • 9
  • 51
  • 91
1

First of all have a look at the following official USPS abbreviations
Street Suffix Abbreviations
Secondary Unit Designators

Then you will have an idea of what you will expect as input, but you also have to take in place all possible unofficial variations/punctuation etc.... A lot of things to do...

In general a street address line should start with a number followed by a space (separates it from 2nd floor etc), one or more words, and finally a street suffix abbreviation. For the city, state, zip tuple again you have to mix full state names and their abbeviations (including short variations like N York or N.York or N. York) and remember the zip5 and zip5+4 cases.

bds
  • 156
  • 1
  • 8