5

I have a string 'CCCC' and I want to match 'CCC' in it, with overlap.

My code:

...
std::string input_seq = "CCCC";
std::regex re("CCC");
std::sregex_iterator next(input_seq.begin(), input_seq.end(), re);
std::sregex_iterator end;
while (next != end) {
    std::smatch match = *next;
    std::cout << match.str() << "\t" << "\t" << match.position() << "\t" << "\n";
    next++;
}
...

However this only returns

CCC 0 

and skips the CCC 1 solution, which is needed for me.

I read about non-greedy '?' matching, but I could not make it work

Patryk
  • 22,602
  • 44
  • 128
  • 244
Gábor Erdős
  • 3,599
  • 4
  • 24
  • 56

1 Answers1

8

Your regex can be put into the capturing parentheses that can be wrapped with a positive lookahead.

To make it work on Mac, too, make sure the regex matches (and thus consumes) a single char at each match by placing a . (or - to also match line break chars - [\s\S]) after the lookahead.

Then, you will need to amend the code to get the first capturing group value like this:

#include <iostream>
#include <regex>
#include <string>
using namespace std;

int main() {
    std::string input_seq = "CCCC";
    std::regex re("(?=(CCC))."); // <-- PATTERN MODIFICATION
    std::sregex_iterator next(input_seq.begin(), input_seq.end(), re);
    std::sregex_iterator end;
    while (next != end) {
        std::smatch match = *next;
        std::cout << match.str(1) << "\t" << "\t" << match.position() << "\t" << "\n"; // <-- SEE HERE
        next++;
    }
    return 0;
}

See the C++ demo

Output:

CCC     0   
CCC     1   
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks, it solved it. I ll mark this as solved as soon as i can. – Gábor Erdős Dec 12 '16 at 11:14
  • this results in an infinite loop on apple clang. – Richard Hodges Dec 12 '16 at 11:29
  • @RichardHodges: It must be related to [this](http://stackoverflow.com/questions/33795759/c-mac-os-x-regex-causes-infinite-loop-with-regex-replace/33799633#33799633): the Mac realization does not handle empty matches efficiently. A `.` added after the lookahead might solve the problem: [`std::regex re("(?=(CCC)).");`](https://ideone.com/pEziQp). If line break characters must be matched, the `.` should be replaced with `[\s\S]`. – Wiktor Stribiżew Dec 12 '16 at 11:35
  • confirming - this worked on the mac: `"(?=(CCC))."` You may want to edit the answer. – Richard Hodges Dec 12 '16 at 11:37