1

I want to create a scraper , to studying. I try to get all exactly 10 digit long numbers possible in a file.

#include <fstream>
#include <iostream>
#include <regex>

int main()
{
    std::string subject("098765432123 1234567890");
    try {
        std::regex re("[0-9]{10}");
        std::sregex_iterator next(subject.begin(), subject.end(), re);
        std::sregex_iterator end;
        while (next != end) {
            std::smatch match = *next;
            std::cout << match.str() << "\n";
            next++;
        }
    } catch (std::regex_error& e) {
        // Syntax error in the regular expression
    }
}

My output is :

0987654321
1234567890

But with this string "098765432123 1234567890" I want to obtain all numbers like :

0987654321
9876543212
8765432123
1234567890

I don't know if the problem come from my regex or from next++

Thanks for your advise.

Benjamin Sx
  • 653
  • 1
  • 7
  • 18
  • Regexes alone cannot do this because you are _manipulating_ the string (by removing intermediate characters such as the space in your example) rather than simply locating matching substrings. – TypeIA Apr 24 '18 at 14:58
  • @BenjaminSx I edited your question to hopefully clarify your goal better. Please revert if i changed what you wanted to ask for. – Daniel Jour Apr 24 '18 at 15:06
  • Exactly a dupe of that question, and I can find more. The regex, `[0-9]{10}`, should be put into a lookahead within a capturing group: `"(?=([0-9]{10}))."` – Wiktor Stribiżew Apr 24 '18 at 16:02

1 Answers1

4

You can stay with std::sregex_iterators and use the solution linked by Drew Dormann in the comments, or you can use instead std::regex_search with iterators and update the first to the position following the one found:

std::string subject("098765432123 1234567890");
std::regex re("[0-9]{10}");
auto first = subject.begin();
auto last  = subject.end();
std::match_results<decltype(first)> match;
while ( std::regex_search(first, last, match, re) ) {
    std::cout << match.str() << "\n";
    first = std::next(match.prefix().second);
}

Demo

O'Neil
  • 3,790
  • 4
  • 16
  • 30