0

I'm trying to get with a QRegularExpression all attributes of an xml tag in the different captured groups. I use a regex matching the tags and I manage to get the capture groups containing the attribute value but with a quantifier, I get only the last one.

I use this regex :

<[a-z]+(?: [a-z]+=("[^"]*"))*>

And I would like to get "a" and "b" with this text :

<p a="a" b="b">

Here is the code:

const QString text { "<p a=\"a\" b=\"b\">" };
const QRegularExpression pattern { "<[a-z]+(?: [a-z]+=(\"[^\"]*\"))*>" };

QRegularExpressionMatchIterator it = pattern.globalMatch(text);
while (it.hasNext())
{
    const QRegularExpressionMatch match = it.next();

    qDebug() << "Match with" << match.lastCapturedIndex() + 1 << "captured groups";
    for (int i { 0 }; i <= match.lastCapturedIndex(); ++i)
        qDebug() << match.captured(i);
}

And the output :

Match with 2 captured groups
"<p a=\"a\" b=\"b\">"
"\"b\""

Is it possible to get multiple capture groups with the quantifier * or have I to iterate using QRegularExpressionMatchIterator with a specific regex on the string literals?

Emma
  • 27,428
  • 11
  • 44
  • 69
Maluna34
  • 245
  • 1
  • 16
  • Why regex? See [a non-regex approach here](https://www.qtcentre.org/threads/31522-Qt-way-of-getting-parsing-style-attributes-from-html-tags-local-file-batch). An [SO thread](https://stackoverflow.com/questions/32082181/how-to-get-xml-attributes-with-this-syntax-using-qt-dom) here. A more generic: [How to parse an HTML file with QT?](https://stackoverflow.com/questions/49223317/how-to-parse-an-html-file-with-qt) – Wiktor Stribiżew May 08 '19 at 11:45
  • It's because I use this in a QSyntaxHighlighter ^^ – Maluna34 May 08 '19 at 12:12

1 Answers1

2

This expression might help you to simply capture those attributes and it is not bounded from left and right:

([A-z]+)(=\x22)([A-z]+)(\x22)

enter image description here

Graph

This graph shows how the expression would work and you can visualize other expressions in this link, if you wish to know:

enter image description here


If you would like to add additional boundaries to it, which you might want to do so, you can further extend it, maybe to something similar to:

(?:^<p )?([A-z]+)(=\x22)([A-z]+)(\x22)

Test for RegEx

const regex = /(?:^<p )?([A-z]+)(=\x22)([A-z]+)(\x22)/gm;
const str = `<p attributeA="foo" attributeB="bar" attributeC="baz" attributeD="qux"></p>`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}
Emma
  • 27,428
  • 11
  • 44
  • 69
  • 1
    Thanks for your help. But here I still have to loop using the regex engine (QRegularExpressionMatchIterator) and we can't have all the attributes in the different capture groups of one match. Is that correct ? – Maluna34 May 08 '19 at 14:34