I am working on a regex problem where I need to identify words in the following natural language sentences which are actually figures, examples such as:
- one hundred one plus one hundred one
- one hundred minus one
- hundred multiplied by one
I am trying to work out a regex which only matches the figures (such as "hundred", "one thousand and eleven" and so on) in the above statements, but am unable to do so. Here is what I have worked out so far:
If I use something like:
([o][n][e]|[h][u][n][d][r][e][d]).*?([o][n][e]|[h][u][n][d][r][e][d])
then it matches only "one hundred" in one hundred one, and if I use:
([o][n][e]|[h][u][n][d][r][e][d]).*([o][n][e]|[h][u][n][d][r][e][d]),
then it matches the whole "one hundred one plus one hundred one"
Can someone guide me as to how I should proceed here?