5

I have problem with boost::regex::regex_match. I work with turned on BOOST_REGEX_MATCH_EXTRA.


What I have:

(this is a simple example of my problem, not a real task)

string input1= "3 4 5";
string input2= "3 4 7";

What I want to get:

list output1= [3 4 5];
list output2= []; //not matched

regex:

(this is working ok)

((?<group>[0-6])[ ]?)*

output1: what["group"]=5 and what["group"].captures()= [3, 4, 5]

output2: not matched

The problem is:

I need to collect data from more then one part of regex to one group.

I tried:

((?<group>[0-6])[ ])*(?<group>[0-6])

output1: what["group"]=4 and what["group"].captures()=[3, 4]

output2: not matched

OK, I understand. It doesn't see second declaration of group.

I tried:

((?<group>[0-6])[ ])*(?&group)

output1: what["group"]=4 and what["group"].captures()= [3, 4, 4]

output2: not matched

  • But What THIS? Where is the second 4 from? It checks "group" pattern, because the first example matches, but the second doesn't. But it doubles last found value instead of saving new. Why? Maybe I forgot to turn on some flags?
  • And is there another way to get in one group data from different part of regex expression?

I have more then one group, so token_iterator can't help me.

And expression should be configured in config file. static Xpressive can't be used.

Box Box Box Box
  • 5,094
  • 10
  • 49
  • 67
Darida
  • 196
  • 1
  • 8
  • 1
    No, you cannot stuff two different parts of a target string into one capture group--at least not without also capturing everything between. – Kenneth K. Apr 08 '13 at 23:47

1 Answers1

0

This is how I interpret your problem:

String: Total price: $1,234

and you want to capture the cost as 1234 (without the comma)

This isn't possible with only regex, as there is no way to capture a group and exclude parts in the middle. That being said, you can use 2 match groups and lookaheads, and then inside code stitch the groups together. Using the above example, if you don't know whether there will be a comma or not (i.e. price ranges from 1-5000) you can do something like

Total price: \$(?P<price>\d{1,3})(?:(?=\,),(?P<price2>\d{3})|)

Which matches 1-3 digits, then look for a comma, and if it exists, use a different name group and match the second chunk.

Here's a really nice resource for regex testing: www.regex101.com

eric
  • 215
  • 2
  • 12
  • Of course, I can do this with addition code. Moreover, with addition code I can do this without regex at all. But I want to save regex to config file. But it is not so easy to create such addition code, that will work with ANY UNKNOWN regex. I don't know how many groups (price, price2, price3 ... ) there will be and in which order – Darida Dec 18 '13 at 09:54
  • @Darida I'm working on a similar case now. My current solution is that in the configuration file I keep not only the regex pattern but also "value description". Basically it is just a sequence of capturing group names and arbitrary strings. Code matches string with the pattern and if successful it builds actual value based on that "value description" concatenating mentioned capturing groups and what so ever is specified. This allows reasonable flexibility in configuration file while still having fixed code behind. – Adam Badura Sep 08 '16 at 07:50