I'm using QRegularExpression
in Qt 5.10.1 to extract sections of text from files that are bound by a header and footer. For example, consider the following text:
...
begin
some text
some more text
...
end
...
begin
etc.
I would then use the following regex to capture a section of text:
^begin\n([\s\S]+?)^end
Nothing out of the ordinary here. The problem is if the section of text is very large (over 100k lines), then the regex stops producing a match. I tried the search in a different text editor (TextPad) and it works fine, so I suspect it is due to some sort of MAX_SIZE constant in QRegularExpression
or more likely the PCRE2 library it uses. But I have no idea where to look or if this is something I can tweak? Or maybe this is considered a bug?
Below is some code that can be used to demonstrate my issue. For me it bombs out at 100,000 lines (10,000,000 bytes).
QString s = "This line of text is exactly one hundred bytes long becuase it's a nice round number for this test.\n";
QRegularExpression re = QRegularExpression(R"(^begin\n([\s\S]+?)^end)", QRegularExpression::MultilineOption);
qDebug() << "start check:";
for (int i=10000; i<200000; i=i+1000) {
QString test = "begin\n" + s.repeated(i) + "end\n";
QRegularExpressionMatch match = re.match(test);
if (!match.hasMatch()) {
qDebug() << "lazy match failed - trying greedy match";
re.setPattern(R"(^begin\n([\s\S]+)^end)");
QRegularExpressionMatch match = re.match(test);
qDebug() << match.hasMatch();
break;
}
qDebug() << i;
}