1

I'm parsing chat output to get the user name. This is what outputs may look like:

<Darker> MC_BOT sethome
(From Darker): MC_BOT exit

I need to match username and the command contents in the string. Taking these data from both strings is easy:

^(<([A-Za-z0-9_]+)>|\\(From ([A-Za-z0-9_]+)\\):) MC_BOT ([a-z]+)( [a-zA-Z0-9 ]+)?$
  |<Darker>         |(From Darker):                     |sethome

Problem is, that for <Darker> output field 2 is used, but for (From Darker) the parser uses field 3.

<Darker> MC_BOT command parameters
   1: <Darker>
   2: Darker  - field 2!
   3: 
   4: command
   5:  parameters


(From Darker): MC_BOT command parameters
   1: (From Darker):
   2: 
   3: Darker  - field 3!
   4: command
   5: parameters   

So how should I write this regexp to make it contain username in the same field? Also, can I make regexp ignore the (...|...)? I only need to match the username, not the <username> or (From username):.

Marc
  • 3,683
  • 8
  • 34
  • 48
Tomáš Zato
  • 50,171
  • 52
  • 268
  • 778
  • Perl/PCRE has branch reset `(?|...)` that can be used for this, don't know about boost tho. – Qtax Mar 21 '13 at 23:26
  • 1
    For the last part of your question (ignoring the outermost group) you may want to use a non-capturing group `(?:` if boost supports it. Unfortunately for the core of your question (group 2/3 depending on the input string) I don't see any obvious solution. – syam Mar 21 '13 at 23:26
  • @syam Thank you, non-capturing option has worked and it helped me. – Tomáš Zato Mar 21 '13 at 23:30

1 Answers1

4

Boost appears to support branch reset. So you could use something like:

^(?|<([A-Za-z0-9_]+)>|\(From ([A-Za-z0-9_]+)\):) MC_BOT ([a-z]+)( [a-zA-Z0-9 ]+)?$
 ^   ^                       ^                          ^       ^
 |    \ group 1               \ also group 1             \ g.2   \ group 3
 |
 \ branch reset
Qtax
  • 33,241
  • 9
  • 83
  • 121
  • And I suddenly understood why the average man thinks programmers are aliens... (+1, nice solution) –  Mar 21 '13 at 23:31
  • 2
    +1 too, but @H2CO3 it's not *just* about programmers here. Remember the saying: *Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have **two** problems.* ;) (even though I'm a big regex fan myself) – syam Mar 21 '13 at 23:39
  • @syam Exactly, just what you say. –  Mar 21 '13 at 23:42
  • *"Now they have two problems"* - Are you implying that there was another solution for my problem? – Tomáš Zato Mar 21 '13 at 23:50
  • @TomášZato this is just a tongue in cheek saying that means that regexes, due to their "cryptic" syntax, often create as much problems as they are supposed to solve. It is not meant to be taken seriously, as I said I'm a big regex fan too. ;) – syam Mar 21 '13 at 23:58