2

I'm making an application that asks the user to enter a postcode and outputs the postcode if it is valid.

I found the following pattern, which works correctly:

String pattern = "^([A-PR-UWYZ](([0-9](([0-9]|[A-HJKSTUW])?)?)|([A-HK-Y][0-9]([0-9]|[ABEHMNPRVWXY])?)) [0-9][ABD-HJLNP-UW-Z]{2})"; 

I don't know much about regex and it would be great if someone could talk me through this statement. I mainly don't understand the ? and use of ().

Duncan Jones
  • 67,400
  • 29
  • 193
  • 254
AngryDuck
  • 4,358
  • 13
  • 57
  • 91
  • By "postcode", do you mean "postal code"? IE what most Americans would call a "zip code"? If this is the case, what country's codes are you targeting? – Roddy of the Frozen Peas Feb 21 '13 at 09:21
  • sorry for not being clear, yes i mean postal code and UK ones at that, anyway i know this works and is correct because iv tested it what i dont understand particularly is why? i only really need the regex explaining – AngryDuck Feb 21 '13 at 09:23
  • `()` denote capturing groups, `?` in a regex means whatever preceded it is optional. Consider searching for a regex tutorial, the one provided by Oracle is a good start. – jlordo Feb 21 '13 at 09:23
  • You could just get a list of all postcodes and see if it is contained in it - probably simpler – Adam Feb 21 '13 at 09:33
  • ok so why was my question edited? – AngryDuck Feb 21 '13 at 09:39
  • @user1110338 the title was edited to make it more relevant to the question and easier to search for (i.e. first comments above). The edits to the body seem to be for readability. – cmbuckley Feb 21 '13 at 09:46
  • @Adam - you could but there are 1.7 million apparently. Anyway the list is available from http://www.ordnancesurvey.co.uk/oswebsite/products/code-point-open/ – Mark Chorley Apr 17 '13 at 16:32
  • lol that comment hadnt seen that before, why would you get a list and basically hard code a check................... – AngryDuck Apr 18 '13 at 08:04

2 Answers2

2

the ? means occurs 0 or 1 times and the brackets do grouping as you might expect, modifiers will work on groups. A regex tutorial is probably the best thing here

http://www.vogella.com/articles/JavaRegularExpressions/article.html

i had a brief look and it seems reasonable also for practice/play see this applet

http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html

simple example (ab)?

means 'ab' once or not at all

Adrian
  • 495
  • 2
  • 10
2

Your regex has the following:

  • ^ and $ - anchors for indicating start and end of matching input.
  • [A-PR-UWYZ] - Any character among A to P or R to U or W,Y,Z. Characters enclosed in square brackets form a character class, which allows any of the enclosed characters and - is for indicating a sequence of characters like [A-D] allowing A,B,C or D.
  • ([0-9]|[A-HJKSTUW])? - An optional character any of 0-9 or characters indicated by [A-HJKSTUW]. ? makes the preceding part optional. | is for an OR. The () combines the two parts to be ORed. Here you may use [0-9A-HJKSTUW] instead of this.
  • [ABD-HJLNP-UW-Z]{2} - Sequence of length 2 formed by characters allowed by the character class. {2} indicates the length 2. So [ABD-HJLNP-UW-Z]{2} is equivalent to [ABD-HJLNP-UW-Z][ABD-HJLNP-UW-Z]
Naveed S
  • 5,106
  • 4
  • 34
  • 52
  • thank you very much that is exactly what i was looking for, love it when people just plain and simple give you an answer you want – AngryDuck Feb 21 '13 at 10:00