1

I am us Qt. I have a text string that I specifically look for a function call xyz.set_name(), I want to capture the last occurrence of this call but negate it if the line that contains it starts with a #. So far I got the regex to match the function call but I don't know how to negate the # matched lines and I don't know how to capture the last occurrence, don't know why all the matches are put into one capture group.

[().\w\d]+.set_name\(\)\s*

This is what I want it to do

abc.set_name() // match
# abc.set_name() // don't match
xyz.set_name() // match and capture this one

Update for more clarification:

My text read like this when printed out with qDebug

Hello\nx=y*2\nabc.set_name()   \n#xyz.set_name()

It's is a long string with \n being as newline.

Update: a longer test string for test. I have tried all the suggested regex on this but they didn't work. Don't know what is missing. https://regex101.com/r/vXpXIA/1

Update 2: Scratch my first update, the \n is a qDebug() thing, it doesn't need to be considered when using regex.

reddy
  • 1,721
  • 3
  • 16
  • 26
  • You example shows the first line being matched but not captured and the last line being both matched and captured. I don't understand what you mean. – Cary Swoveland Aug 08 '20 at 03:41
  • Use `(?s).*\n(?!\h*#)\h*([\w().]+\.set_name\(\))`, see [demo](https://regex101.com/r/Z9zviz/2/) – Wiktor Stribiżew Aug 08 '20 at 08:45
  • Further to my first comment, you can only match a substring. You cannot match two substrings separated by a substring that isn't matched. In your three-line example you could match all three lines (the entire string) and capture the last line or match just the last line, in which case there would be no point in capturing it as well. – Cary Swoveland Aug 09 '20 at 15:05

3 Answers3

1

If you merely want to match the last line that matches the pattern

^[a-z]+\.set_name\(\)

you can use the regular expression.

(?smi)^[a-z]+\.set_name\(\)(?!.*^[a-z]+\.set_name\(\))

For simplicity I've used the character class [a-z]. That can be changed to suit requirements. In the question it is [().\w\d], which can be simplified to [().\w].

Note that since the substring of interest is being matched there is no point to capturing it as well. The fact that one of the lines prior to the last one begins with '#' is not relevant. All that matters is whether the lines match a specified pattern.

Start your engine!

The PCRE regex engine performs the following operations.

(?smi)                  : set single-line, multi-line and case-indifferent
                          modes  
^                       : match the beginning of a line
[a-z]+\.set_name\(\)    : match 1+ chars in the char class, followed
                          by '.set_name\(\)'
(?!                     : begin negative-lookahead
.*^[a-z]+\.set_name\(\) : match 0+ chars (including newlines), the  
                          beginning of a line, 1+ letters, '\.set_name\(\)' 
)                       : end negative lookahead

Recall that single-line mode causes . to match newlines and multi-line mode causes ^ and $ to match the beginning and ends of lines (rather than the beginning and end of the string).

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • Hi, my text string is a long string of the whole text with `\n` being as newline. Your regex couldn't find any match. – reddy Aug 08 '20 at 04:36
  • TIL Negative lookahead. Nice! – Roy2511 Aug 08 '20 at 05:15
  • reddy, the text I used at the link in my answer was `abc.set_name()\n# abc.set_name()\nxyz.set_name()\n`, and as you see, `xyz.set_name()` was matched by the regex. I don't understand why you don't get the same result if you are using the same test string. – Cary Swoveland Aug 08 '20 at 07:44
  • @CarySwoveland Hi I have updated the first post with a longer string for testing – reddy Aug 09 '20 at 07:14
  • The problem stems from the way test cases are entered at regex101.com. To enter the string `"Hello\nWorld"` you need to put `Hello` on one line and `World` on the next line. There will then be an unseen newline after `Hello`. regex101 will read that as `"Hello\nWorld"`. When you enter `Hello\nWorld` on one line regex101 reads `\n` as two characters, not as an escaped `"n"`. [ref](https://regex101.com/r/1R6ugG/2/) – Cary Swoveland Aug 09 '20 at 14:52
  • 1
    I got it to work, thanks. I also got confused by qDebug() where it output the string with `\n` and I thought I have to take that into account. `(?smi)^[().\w]+\.set_name\(\)(?!.*^[().\w]+\.set_name\(\))`. I'll update the first post to not confuse future people. – reddy Aug 09 '20 at 19:18
  • It would be better to not confuse both existing and future people. :-) – Cary Swoveland Aug 09 '20 at 21:19
0

You need the regex lookahead operators (if your regex engine supports it). This will work.

(?(?=^[^#])(^\s*[a-zA-Z]+\.set_name\(\))|z^)

Explanation:

  • (?(?=patt)then|else) - Regex if-else construct, if regex matches given pattern patt, then is matched, otherwise else is matched

  • patt = ^[^#] -- at the start of the line, no #

  • then part - if patt is true -- ^\s*[a-zA-Z]*\.set_name\(\) matches any number of whitespace followed by <something>.set_name() where something is variable name.

  • else part -- If patt is false -- match z^ which is z coming before start of line, which isn't possible.


Edit: just realised you can have digits in variable names (but it cannot start with one). In that case, improved regex (not tested)

(?(?=^[^#])(^\s*[a-zA-Z]+[a-zA-Z\d]*\.set_name\(\))|z^)

Edit: Since you also have newline characters in your string, it doesn't match the problem description in your question. Nevertheless, simple enough to deal with by just tokenising the string.

Just split up the strings based on new line.

#include <iostream>
#include <string>
#include <sstream>
#include <vector>

int main()
{
    std::istringstream isr;
    isr.str("I am John\n today is  \n#abc.set_name()\n");
    std::string tok;
    std::vector<std::string> vs;
    while(std::getline(isr, tok))
    {
        std::cout << tok << std::endl;
        vs.push_back(tok);
    }
    
    for (auto r_it = vs.rbegin(); r_it != vs.rend(); ++r_it)
    {
        std::cout << *r_it << std::endl;
        // if match then break from loop
    }
}


Roy2511
  • 938
  • 1
  • 5
  • 22
0

You may use

(?s).*\n(?!\h*#)\h*([\w().]+\.set_name\(\))

See the regex demo, your match is in Group 1. Details:

  • (?s) - DOTALL mode on, . now matches any chars
  • .* - any zero or more chars as many as possible
  • \n(?!\h*#) - a newline that is not immediately followed with 0 or more horizontal whitespaces and then a # char
  • \h* - 0+ horizontal whitespaces
  • ([\w().]+\.set_name\(\)) - Capturing group 1:
    • [\w().]+ - 1 or more word chars, ), ( or .
    • \.set_name\(\) - a .set_name() string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563