0

I am new to Regex and I am looking to write a Regex to extract any kind of numbers (like 23,23a,24-26) which are immediately followed by a text which is surrounded by !.

More explanation: Need to match numbers (simple numbers, the range of numbers separated by a dash or a number followed by a letter (a,b,c,d,..)) that are immediately followed by itself (ignoring commas) and a text surrounded by !

For example in the below text, I am looking for the part which I made Italic

; 46-58 !some text! ; , 5 some text, 3-21 , 6-22 some text, 16 some text !some text! ; 46-58 some text, 5 !some text! ; 3-21 , 6-22 some text, 16 some text, some text !some text! ; 46-58 some text, 5 some text, 3-21 ,23a , 6-22 !some text! ;

To make it clearer, I made the text that I am interested Red. enter image description here

So far I came up with the following Regex

\![\w\s]*\! => find the text surrounded by !

[a-z]?[\s|,]? [\-|,| | | | | |0-9|-|\d+[\-|a-z]*\d*]*\![\w\s]*\! => this one select everything between two consecutive ;

\d+[-,]*[a-z]*\d+[a-z]*\s*[,]* => this one select any kind of numbering

But so far, I was not able to put them together to select what I want.

Mohsen Sichani
  • 1,002
  • 12
  • 33
  • show what values should be matched – RomanPerekhrest Apr 23 '17 at 19:43
  • Shoulf 23a be matched as well? – gaganshera Apr 23 '17 at 19:48
  • Yes , 23a should also match. I made Italic the parts that should match. Thank you – Mohsen Sichani Apr 23 '17 at 20:01
  • Why the 3-21 from the end of the text should be matched when the first occurrence of it (which is exactly the same) should not? – Jorge Campos Apr 23 '17 at 20:15
  • Because the first one is not surrounded by ! – Mohsen Sichani Apr 23 '17 at 20:17
  • The parts with words bound by `!` are regular enough. But I don't quite get how the *number* part is regular? Can they contain all chars? Or only `a` and `-` and commas and white spaces?!! – jrook Apr 23 '17 at 20:22
  • Thank jrook, they are street numbers. So we may have 25,25a,25-55,57c XYZ street. Please let me know if you need further explanation. – Mohsen Sichani Apr 23 '17 at 20:26
  • 1
    Maybe [this](https://regex101.com/r/TSfjiS/1) can be a start. It matches the parts you have marked. But I am not sure if it encompasses all the patterns you want to match. – jrook Apr 23 '17 at 20:27
  • In general, if the street numbers are too irregular (to a degree that are indistinguishable from the parts bounded by `!`, then maybe regex is not the tool you want to use. I think will need to split the string by `!***!` sections and then try to somehow parse the address based on the rules you have. – jrook Apr 23 '17 at 20:32
  • None of the two (three in fact) are surrounded by `!` you need to exaplain what is your concept of "surrounded" because the only thing they are surrounded with is with `,` or the other texts that are surrounded with `!` like `!some text!` this requirements makes no sense – Jorge Campos Apr 23 '17 at 20:32
  • @JorgeCampos: See [this](https://regex101.com/r/TSfjiS/1). I think he wants to match some pattern immediately before parts bounded by `!`s. – jrook Apr 23 '17 at 20:34
  • Thank Jrook, I tried it in C#. It picks the correct ones, and I can do the rest by programming (check to see whether this has follows by !). This was really helpful and I really appreciate that. – Mohsen Sichani Apr 23 '17 at 20:36
  • Belive me jrook, the text was so messy, at least I was able to find the street names correctly by C# and openstreetmap. However, you helped to detect numbers followed by ! and appreciate that. I can do the rest. – Mohsen Sichani Apr 23 '17 at 20:40
  • 2
    @jrook Ok, now I understand. So his requirement should be written as: Need to match numbers (simple numbers, range of numbers separated by dash or a number followed by a letter) that are immediately followed by itself (ignoring commas) and a text surrounded by `!` – Jorge Campos Apr 23 '17 at 20:40
  • @jrook You should provide that as an answer. You will have my upvote. – Jorge Campos Apr 23 '17 at 20:41
  • Thank @Jorge, I will edit the question. I thought the title was clear enough. – Mohsen Sichani Apr 23 '17 at 20:42

1 Answers1

1

If I understand correctly, you want to match a pattern immediately before the parts of the text bounded by ! symbols. I think the exact answer will depend on this pattern that is to be matched. It might not be a good fit for regular expressions after all.

I created this example (javascript) that matches the expressions in the question.

Note: I made some assumptions about the pattern that is to be matched. Namely, the pattern starts with a digit and can only contain digits, dashes, commas, and character a.

jrook
  • 3,459
  • 1
  • 16
  • 33
  • it may also contain b, c,.or a few more, I have added them based on what you wrote. Much appreciated. Thanks a lot for your answer, and also making the assumptions clear. – Mohsen Sichani Apr 23 '17 at 20:50