Finding all matches with a regular expression - greedy and non greedy!

Question

Take the following string: "Marketing and Cricket on the Internet".

I would like to find all the possible matches for "Ma" -any text- "et" using a regex. So..

Market
Marketing and Cricket
Marketing and Cricket on the Internet

The regex Ma.*et returns "Marketing and Cricket on the Internet". The regex Ma.*?et returns Market. But I'd like a regex that returns all 3. Is that possible?

Thanks.

LEPL, a parsing library for Python, has regexes that `yield` all possible matches. — , Nov 03 '10 at 21:07

score 2 · Accepted Answer · answered Nov 03 '10 at 21:08

2

As far as I know: No.

But you could match non-greedy first and then generate a new regexp with a quantifier to get the second match. Like this:

Ma.*?et
Ma.{3,}?et

...and so on...

answered Nov 03 '10 at 21:08

thejh

44,854
16
96
107

Rastaboy · Answer 2 · 2010-11-05T16:05:46.707

Thanks guys, that really helped. Here's what I came up with for PHP:

function preg_match_ubergreedy($regex,$text) {

    for($i=0;$i<strlen($text);$i++) {
        $exp = str_replace("*","{".$i."}",$regex);
        preg_match($exp,$text,$matches);
        if($matches[0]) {
            $matched[] = $matches[0];
        }
    }

    return $matched;

}
$text = "Marketing and Cricket on the Internet";
$matches = preg_match_ubergreedy("@Ma.*?et@is",$text);

score 0 · Answer 3 · answered Nov 03 '10 at 21:04

Sadly, this is not possible to do with a standard POSIX regex, which returns a single (best candidate, per regex rules) match. You will need to utilize an extension feature, which may be present in the particular programming language in which you are using this regex, assuming that you are using it in a program, to accomplish this task.

score 0 · Answer 4 · answered Nov 03 '10 at 22:24

For a more general regular expression, another option would be to recursively match the greedy regular expression against the previous match, discarding the first and last characters in turn to ensure that you're matching only a substring of the previous match. After matching Marketing and Cricket on the Internet, we test both arketing and Cricket on the Internet and Marketing and Cricket on the Interne for submatches.

It goes something like this in C#...

public static IEnumerable<Match> SubMatches(Regex r, string input)
{
    var result = new List<Match>();

    var matches = r.Matches(input);
    foreach (Match m in matches)
    {
        result.Add(m);

        if (m.Value.Length > 1)
        {
            string prefix = m.Value.Substring(0, m.Value.Length - 1);
            result.AddRange(SubMatches(r, prefix));

            string suffix = m.Value.Substring(1);
            result.AddRange(SubMatches(r, suffix));
        }

    }

    return result;
}

This version can, however, end up returning the same submatch several times, for example it would find Marmoset twice in Marketing and Marmosets on the Internet, first as a submatch of Marketing and Marmosets on the Internet, then as a submatch of Marmosets on the Internet.

Finding all matches with a regular expression - greedy and non greedy!

4 Answers4