0

I need to extract some data from string with simple syntax. The syntax is this:

_IMPORT:[any text] - [HEX number] #[decimal number]

Therefore I created regex you can see below in the code:

 //SYNTAX:  _IMPORT:%1 - %2 #%3
 static const QRegExp matchImportLink("^_IMPORT:(.*?) - ([A-Fa-f0-9]+) #([0-9]+)$");
 QRegExp importLink(matchImportLink);
 QString qtWtf(importLink.pattern());
 const int index = importLink.indexIn(mappingName);

 qDebug()<< "Input string: "<<mappingName;
 qDebug()<< "Regular expression:"<<qtWtf;
 qDebug()<< "Result: "<< index;

For some reason, that does not work, I get this output:

Input string:  "_IMPORT:ddd - 92806f0f96a6dea91c37244128f7d00f #0"
Regular expression: "^_IMPORT:(.*?) - ([A-Fa-f0-9]+) #([0-9]+)$"
Result:  -1

I even tried to remove the anchors ^ and $ but that didn't help and also is undesired. The annoying thing is that this regexp works perfectly if I copy the output in regex101.com, as you can see here: https://regex101.com/r/oT6cY3/1

Can anyone explain what is wrong here? Did I stumble upon Qt bug? I use Qt 5.6. Is there any workaround for this?

Tomáš Zato
  • 50,171
  • 52
  • 268
  • 778
  • Not experienced in regexp. But the round bracktes after `IMPORT:**(** ` and all the others) look strange for me. - I would expect the RegExp to match the `(`character, which is not in the expression. But if they have a regexp-semantik just forget my comment. – Bernhard Heinrich Aug 15 '16 at 13:05
  • @BernhardHeinrich They use (quoting the docs) "*A rich Perl-like pattern matching syntax*" which means capture groups exist and I have used them in past without a problem. – Tomáš Zato Aug 15 '16 at 13:06
  • I see that changing `(.*?)` to `(.*)` helps, but not sure why. Changing regexp engine doesn't help either... – mike.dld Aug 15 '16 at 13:15
  • @mike.dld Seems like they get it the opposite way around, because `(.*?)` is non greedy as to prevent matching the ` - [hex]` part. – Tomáš Zato Aug 15 '16 at 13:21
  • 2
    Just use QRegularExpression already! :) QRegExp supports a very limited pattern syntax (in particular: it does not support non-greedy quantifiers). QRegularExpression supports PCREs instead. – peppe Aug 15 '16 at 16:01
  • @peppe thanks for mentioning that. I will look into it. I think they should mention this in `QRegExp` docs... – Tomáš Zato Aug 16 '16 at 08:27
  • https://doc.qt.io/qt-5/qregexp.html "Note: In Qt 5, the new QRegularExpression class provides a Perl compatible implementation of regular expressions and is recommended in place of QRegExp." – peppe Aug 16 '16 at 09:22
  • @peppe The problem is 4.8 QRegExp doc version is top on google. I think there should be some link to latest version for all docs... – Tomáš Zato Aug 16 '16 at 10:38
  • Yes, that's definitely bad. https://bugreports.qt.io/browse/QTWEBSITE-721 – peppe Aug 16 '16 at 10:57

1 Answers1

2

It seems like Qt does not recognize the quatifier *? as valid. Check the method QRegExp::isValid() againts your pattern. In my case it did not work because of this. And the documentation tells that any invalid pattern will never match.

So first thing I tried was skipping the ? which perfectly fits your provided string with all capturing groups. Here is my code.

QString str("_IMPORT:ddd - 92806f0f96a6dea91c37244128f7d00f #0");
QRegExp exp("^_IMPORT:(.*) - ([A-Fa-f0-9]+) #([0-9]+)$");

qDebug() << "pattern:" << exp.pattern();
qDebug() << "valid:" << exp.isValid();
int pos = 0;
while ((pos = exp.indexIn(str, pos)) != -1) {
    for (int i = 1; i <= exp.captureCount(); ++i)
        qDebug() << "pos:" << pos << "len:" << exp.matchedLength() << "val:" << exp.cap(i);
    pos += exp.matchedLength();
}

And here is the resulting output.

pattern: "^_IMPORT:(.*) - ([A-Fa-f0-9]+) #([0-9]+)$"
valid: true
pos: 0 len: 49 val: "ddd"
pos: 0 len: 49 val: "92806f0f96a6dea91c37244128f7d00f"
pos: 0 len: 49 val: "0"

Tested using Qt 5.6.1.

Also note that you may set greedy evaluation using QRegExp::setMinimal(bool).

maxik
  • 1,053
  • 13
  • 34