1

following string:

23434 5465434

58495 / 46949345

58495 - 46949345

58495 / 55643

d 44444 ssdfsdf

64784

45643 dfgh

58495/55643

48593/48309596

675643235

34565435 34545

it only want to extract the bold ones. its a five digit number(german). it should not match telephone numbers 43564 366334 or 45433 / 45663,etc as in my example above.

i tried something like ^\b\d{5} but thats not a good beginning.

some hints for me to get this working?

thanks for all hints

Community
  • 1
  • 1
choise
  • 24,636
  • 19
  • 75
  • 131
  • 2
    please add additional information how to tell why some 5-digit numbers are acceptable and others are not. Are they a specific range? Is there a specific format? The more information you provide, the better the answers people can provide. As is, people are having to guess what you want. – the Tin Man Feb 19 '11 at 01:18

4 Answers4

2

You could add a negative look-ahead assertion to avoid the matches with phone numbers.

\b[0124678][0-9]{4}\b(?!\s?[ \/-]\s?[0-9]+)

If you're using Ruby 1.9, you can add a negative look-behind assertion as well.

oylenshpeegul
  • 3,404
  • 1
  • 18
  • 18
  • Just what it sounds like. The (?!...) part above is looking ahead and refusing to match if the ... conditions are met. In Ruby 1.9, we can use (?<!...) to similarly look backwards. – oylenshpeegul Feb 19 '11 at 12:50
  • 1
    I see this regexp floating around in a lot of places, but it's NOT correct. The problem is that any German zipcode which starts with a 3, 5 or 9 isn't seen as valid, although they are perfectly valid and also used. Also see http://en.wikipedia.org/wiki/List_of_postal_codes_in_Germany – Dirkjan Bussink Mar 18 '11 at 09:07
1

You haven't specified what distinguishes the number you're trying to search for.

Based on the example string you gave, it looks like you just want: ^(\d{5})\n

Which matches lines that start with 5 digits and contain nothing else.

You might want to permit some spaces after the first 5 digits (but nothing else): ^(\d{5})\s*\n

Olhovsky
  • 5,466
  • 3
  • 36
  • 47
0

I'm not completely sure about the specified rules. But if you want lines that start with 5 digits and do not contain additional digits, this may work:

^(\d{5})[^\d]*$

If leading white space is okay, then:

^\s*(\d{5})[^\d]*$

Here is the Rubular link that shows the result.

Mark Wilkins
  • 40,729
  • 5
  • 57
  • 110
  • looks promising, but i dont want to match the charakters in your second match, also `d 44444 ssdfsdf` should match =/ very complicated – choise Feb 19 '11 at 01:02
0
^\D*(\d{5})(\s(\D)*$|()$)

This should (it's untested) match:

  • line starting with five digits (or some non-digits and then five digits), then a space, and ending with some non-numbers
  • line starting and ending with five digits (or some non-digits and then five digits)

\1 would be the five digits

\2 would be the whole second half, if any

\3 would be the word after the digits, if any

edited to fit the asker's edited question

edit again: I came up with a much more elegant solution:

^\D*(\d{5})\D*$
tim
  • 43
  • 4
  • also a good start. but it also matches charakters and it should work if the line is starting with something else then a digit (check edit) your matches => http://www.rubular.com/r/9FUAYtkP4X – choise Feb 19 '11 at 01:06
  • Actually, I just came up with a better solution. Check the post. – tim Feb 19 '11 at 02:32
  • hm sorry. not working for me, because it includes charakters. whitespace is no problem, but charakters are. – choise Feb 19 '11 at 11:38
  • Honestly, you're going to have to be way more specific. No one knows what you mean by "charakters". They're _all_ characters. – tim Feb 20 '11 at 23:36