1

I have to extract single-line comments from qmake project file. Rules are simple: comment begins with # symbol and begin with line-break \n. So i'm read some documentation about QRegExp, and write such code to print all comments in qmake file:

QRegExp re ("#(.*)\n$");
re.setMinimal (true);
int comment_index = 0;
while ((comment_index = _project_contents.indexOf (comment_expr, comment_index)) != -1)
{
    QString comment_text = comment_expr.cap (0);
    qDebug() << "Comment 1" << comment_text;
}

But it is not work correctly - just all contents of project file has been printed. Where is my mistake? as i understand from docs, this should work, but it doesn't.

P.S. I'm a newbie in regexes, so please don't beat me :)

eraxillan
  • 1,552
  • 1
  • 19
  • 40

1 Answers1

2

The problem is that . "matches any character (including newline).". And the $ is the end of the string.

You could try using not-newline - [^\n] and changing the $ to (\n|$) (newline or end of string):

"#[^\n]*(\n|$)"

But then this matches # anywhere instead of just at the start of a line, so let's try this:

"(^|\n)#[^\n]*(\n|$)"

^ is the start of the string, so basically (^|\n) (start of string or new line) is just before the start of a line.

Can you see a problem there? What if you have 2 comments in 2 consecutive lines? You'll only match the first, since the new-line will be consumed during matching the first (since the next match starts where the previous one finished).

A work-around for this is using look-ahead:

"(^|\n)#[^\n]*(?=\n|$)"

This causes the end newline to not be included in the match (but it is still checked), thus the position will be just before the new-line and the next match can use it.

Can the # be preceded by spaces? If so, check for zero or more spaces (\s*):

"(^|\n)\s*#[^\n]*(?=\n|$)"
Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
  • Thanks! last version of your regex is what i need to. By the way, can you suggest some nice examples/books about `QRegExp`? i need to deeply undestand them because of heavy use. – eraxillan Jul 04 '13 at 12:43
  • 1
    I don't have any must-have recommendations. Regex follows a standard syntax (with some minor variations), so any tutorial should be fine. [This (specific to QT)](http://harmattan-dev.nokia.com/docs/library/html/qt4/qregexp.html#details) has an introduction and recommends "Mastering Regular Expressions (Third Edition) by Jeffrey E. F. Friedl". [Here's also a website I sometimes link to](http://www.regular-expressions.info/). Or you can just check a few of [these](https://www.google.com/search?q=introduction+to+regular+expressions) out. – Bernhard Barker Jul 04 '13 at 13:08
  • Thanks again, i will see this materials. P.S. I hope what i will never be forced to parse HTML with regex's :) – eraxillan Jul 04 '13 at 13:29
  • If you ever find yourself needing / wanting to parse HTML with regex, you should read [this](http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html) first. – Bernhard Barker Jul 04 '13 at 13:32
  • I understand what it is ugly way. But for replacing contents of `` to `bla-bla` regex is working fine - just for example. – eraxillan Jul 04 '13 at 15:56
  • @Dukeling: +1 for 'Mastering Regular Expressions' – Zaiborg Jul 08 '13 at 13:13