0

I am working on a regex problem where I need to identify words in the following natural language sentences which are actually figures, examples such as:

  1. one hundred one plus one hundred one
  2. one hundred minus one
  3. hundred multiplied by one

I am trying to work out a regex which only matches the figures (such as "hundred", "one thousand and eleven" and so on) in the above statements, but am unable to do so. Here is what I have worked out so far:

If I use something like:

([o][n][e]|[h][u][n][d][r][e][d]).*?([o][n][e]|[h][u][n][d][r][e][d])

then it matches only "one hundred" in one hundred one, and if I use:

([o][n][e]|[h][u][n][d][r][e][d]).*([o][n][e]|[h][u][n][d][r][e][d]), 

then it matches the whole "one hundred one plus one hundred one"

Can someone guide me as to how I should proceed here?

Biffen
  • 6,249
  • 6
  • 28
  • 36
Amit Sonik
  • 11
  • 3
  • Please see [this thread](https://stackoverflow.com/questions/51422401/numeric-words-in-a-string-into-numbers/51424104#51424104), it might help. – Wiktor Stribiżew Feb 14 '19 at 10:44
  • 1
    What’s up with putting each character in its own class?! – Biffen Feb 14 '19 at 10:53
  • `a regex which only matches the figures` => don't use `.`; neither `.*` nor `.*?` are appropriate in your case. Have a group that matches "number tokens" such as your first capturing group (but pleaaase write it `(one|hundred)`, those useless character classes are aggravating) and repeat that with whitespace separators, e.g. `(one|hundred)(\s+(one|hundred))*` – Aaron Feb 14 '19 at 10:58
  • Thanks for the help everyone – Amit Sonik Feb 14 '19 at 11:56
  • So what works for you? – Wiktor Stribiżew Feb 14 '19 at 12:02
  • I have used a combination of the info I got from the thread and the regex provided by @Aaron. Now I am able to match any number at any location in a sentence. – Amit Sonik Feb 14 '19 at 12:07

0 Answers0