3

We've got a legacy custom address control. It provides a free text form into which users can enter any address, even a partial or invalid address; see green arrow in the screenshot:

example address control with free text form

Entering an address in this free text form is to provide better user experience; however, the address has to be structured for further processing. Consequently, the address is analysed to determine street, town, post code, country etc.

To determine the country seems fairly easy. Our current (for readability simplified) source code looks like this:

private static string DetermineCountryFromAddress(string fullAddress)
{
    // determine list of countries found in the full address
    string[] addressLines = fullAddress.Split(Environment.NewLine.ToCharArray());
    IList<string> countries = new List<string>();
    foreach (string addressLine in addressLines)
    {
        // check whether there's a country name hidden in this address line
        string countryName;
        if (ContainsCountry(addressLine, out countryName))
            countries.Add(countryName);
    }

    // if there has been a country found, return the country found last;
    // otherwise, return the default country (constant)
    return countries.Any() ? countries[countries.Count - 1] : DefaultCountryName;
}

For curiosity, this is how our simplified ContainsCountry() method looks like:

private static bool ContainsCountry(string addressLine, out string foundCountryName)
{
    // check against all countries
    foreach (string countryName in
        AllCountryNames.Where(countryName => addressLine.Contains(countryName)))
    {
        foundCountryName = countryName;
        return true;
    }

    // nothing found
    foundCountryName = null;
    return false;
}

This solution though doesn't address these requirements:

  • Country can be at any line, not only the last one
  • If no country provided, country names which form part of street names should be ignored

Is there anybody who has a smart enhancement (solution) that fully addresses one or both requirements? Using an external service provider for address validation is excluded from acceptable answers.

Quality Catalyst
  • 6,531
  • 8
  • 38
  • 62
  • Can you share code for `ContainsCountry` function? – Nikitesh Mar 02 '15 at 04:00
  • Principle source code added to question. We simply check whether the country name can be found in an address line. – Quality Catalyst Mar 02 '15 at 04:09
  • How do you check it? As in do you have a list of countries or you are using Culture? – Nikitesh Mar 02 '15 at 04:11
  • We have a list of countries in a data store. We check against this list of country names. The property `AllCountryNames` provides access to a cached copy of the data store's list. – Quality Catalyst Mar 02 '15 at 04:12
  • Whatever smart logic you apply, matching just strings can result into all sorts of error. For example the system might find an address with "new mexico" to be in Maxico country where as it is in New Maxico state in the USA. – TejSoft Mar 02 '15 at 04:22
  • @TejSoft Very good observation; this is one of the two challenges we are facing. If you have a solution in mind that doesn't use string matching or an external provider please share! – Quality Catalyst Mar 02 '15 at 04:26
  • What kind of application application is this one? Web or desktop application? Can you give them a popup or a small window where they can enter address in separate fields (street, state, postcode and country) and when they click on "done" button, address fields are concatenated to a single string. This way you have more control on what the users enter. – TejSoft Mar 02 '15 at 04:39
  • @TejSoft It's a desktop application and we already provide such a dialogue. Despite its benefits it lacks in user experience hence the free text form requirement. – Quality Catalyst Mar 02 '15 at 19:32
  • @QC, how about this: whenever the user clicks on the address textarea, super impose it with separate text boxes for street name, state, post code etc. On lost focus the text boxes becomes a text area. Similar to the date input which shows "__/__/____" when you click in them. You might have to write your own user control. – TejSoft Mar 03 '15 at 01:39
  • @TejSoft That's a very good ideas as well. Thanks for sharing. However, how is copying an address from the clipboard into this customs control to work? The challenge we try to solve is indeed identifying/finding a country in a free form address no matter the address format. – Quality Catalyst Mar 03 '15 at 21:38

1 Answers1

2

According to me this is best possible solution

string[] addressLines = fullAddress.Split(Environment.NewLine.ToCharArray());
IList<string> countries = new List<string>();

// This will save you a bit of computation, as 90% 
// of country names will be towards the end.
for (string addressLine in addressLines.Reverse())
{
    // check whether there's a country name hidden in this address line
    string countryName;
    if (ContainsCountry(addressLine, out countryName))
        countries.Add(countryName); //Break if a country is found would further reduce the unnecessary iterations.
}

Or other option would be to use linq

List<string> addressLines = new List<string>(Regex.Split(fullAddress, 
    Environment.NewLine));

string countryname = CountryNameList.Where(y =>
    addressLines.Any(z => z == y.countryName)).FirstOrDefault();

You can also get a list if you use ToList() instead of FirstOrDefault().

Nikitesh
  • 1,287
  • 1
  • 17
  • 38
  • If you're sorting the address in reverse, wouldn't you just break as soon as you found a country name, rather than adding to a list and then returning the first item (I assume you would return the first item in your case rather than the last item as the OP was doing)? – Rufus L Mar 02 '15 at 04:56
  • If we had to break then why to use list in first place? i also thought of same but then did'nt write it since it was that way in the question. Anyways ill add it as comment in the code. – Nikitesh Mar 02 '15 at 05:00
  • Don't forget to `Trim()`. – abatishchev Mar 02 '15 at 05:12
  • 1
    And replace `Where(predicate).FirstOrDefault()` with `FirstOrDefault(predicate)`. – abatishchev Mar 02 '15 at 05:12
  • @NikiteshKolpe Thank you! Your example is indeed some steps smarter and optimizes the existing code (up-vote for that). What it doesn't do though is addressing the two mentioned challenges in the question: the country doesn't have to be in the last line and if not provided it would still find a country in a street line as shown in the screenshot. – Quality Catalyst Mar 02 '15 at 19:36
  • its about probabilty now.. Are you sure you dont want to use bing library? ploting the location would ease your work to great extent to confirm the place really exists. Currently how are you validating the address? – Nikitesh Mar 03 '15 at 03:38
  • @NikiteshKolpe: the system itself has an integration point with address validation services. Depending on the country where the system is used this service is available or not. If not, the discussed features are needed. Validation is a separate concern and done on structured data, not the free text. – Quality Catalyst Mar 09 '15 at 00:11
  • I was wondering if you could get the list of states based on country, it would be easier to eliminate further incorrect options. – Nikitesh Mar 09 '15 at 03:48