0

I have the following regex that I use to validate North American postal codes:

(?:(\d{5})(?:-\d{4})?)|(?:([a-zA-Z]\d[a-zA-Z]) ?(\d[a-zA-Z]\d))

FYI, I understand that it could be more exact, in regards to verifying certain characters in certain positions.

What I'd like to do is use this same validation constant to also extract the postal code in the format:

00000
  or
a0a0a0

The regex above comes close; if I concatenate all of the capturing groups (except for the root), I get the result I seek. For example, a US code will capture in group 1, or a Canadian code will capture in 2 + 3.

Is there a better way to do this? Or maybe that is assumed it would be a feature of a regex library, to concatenate all subgroups. (Incidentally, this is C# .NET)

shannon
  • 8,664
  • 5
  • 44
  • 74

1 Answers1

0

I'd make your 2nd(([a-zA-Z]\d[a-zA-Z])) and 3rd((\d[a-zA-Z]\d)) groups passive, but make Canadian code group ((?:([a-zA-Z]\d[a-zA-Z]) ?(\d[a-zA-Z]\d)) which is passive now) non-passive. In this case there will be only two non-passive groups: one for US code, and one for Canadian. Also I'd add word boundaries for each group:

var regex = new Regex(@"\b(?:(\d{5})(?:-\d{4})?)\b|\b((?:[A-Z]\d[A-Z]) ?(?:\d[A-Z]\d))\b", RegexOptions.IgnoreCase);
var input = @"00000 or a0a 0a0 and not 11111a or b1b1b11";
var postalCodes = regex.Matches(input)
    .Cast<Match>()
    .Select(m => m.Value)
    .ToArray();

This will match 00000 and a0a 0a0, but will skip incorrect 11111a and b1b1b11.

Oleks
  • 31,955
  • 11
  • 77
  • 132
  • But it will also match b1b1b1, so I will still need to normalize the space. The ignore case is nicer. I think the Matches().Value actually returns the non-capturing portion of the match as well? – shannon Nov 14 '12 at 16:19
  • For example, for: input = "00000 or a0a 0a0 or a0a0a0 or 55555-1111 and not 11111a or b1b1b11" postalCodes == ["00000","a0a 0a0","a0a0a0","55555-1111"] – shannon Nov 14 '12 at 16:45
  • 1
    @shannon: `Match.Value` (actually `Capture.Value`) returns the captured substring from the input string. It seems I didn't understand that you want to normalize the space from the original question. In this case concatenation of groups looks like the best option. – Oleks Nov 15 '12 at 11:08