0

I am writing the loading procedure for my application and it involves reading data from a file and creating an appropriate object with appropriate properties.

The file consists of sequential entries (separated by a newline) in the following format:

=== OBJECT TYPE ===
<Property 1>: Value1
<Property 2>: Value2
=== END OBJECT TYPE ===

Where the values are often strings which may consist of arbitrary characters, new-lines, etc.

I want to create a std::regex which can match this format and allow me to use std::regex_iterator to read each of the objects into the file in turn.

However, I am having trouble creating a regex which matches this type of format; I have looked at the ECMAScript syntax and create my regex in the following way, but it does not match the string in my test application:

const std::regex regexTest( "=== ([^=]+) ===\\n([.\\n]*)\\n=== END \\1 ===" );

And when using this in the following test application, it fails to match the regex to the string:

int main()
{
    std::string testString = "=== TEST ===\n<Random Example>:This is a =test=\n<Another Example>:Another Test||\n=== END TEST ===";

    std::cout << testString << std::endl;

    const std::regex regexTest( "=== ([^=]+) ===\\n([.\\n]*)\\n=== END \\1 ===" );
    std::smatch regexMatch;

    if( std::regex_match( testString, regexMatch, regexTest ) )
    {
        std::cout << "Prefix: \"" << regexMatch[1] << "\"" << std::endl;
        std::cout << "Main Body: \"" << regexMatch[2] << "\"" << std::endl;
    }

    return 0;
}
Thomas Russell
  • 5,870
  • 4
  • 33
  • 68
  • What compiler are you using? Are you aware of that `str::regex` is not implemented (fully/at all) in some compilers? `g++` specifically (last I checked). – Qtax Jun 18 '13 at 20:58
  • @Qtax I am using Microsoft Visual Studio 2012 and the standard library implementation which ships with that, which as far as I'm aware, provides a full implementation of `std::regex` and associated functions. – Thomas Russell Jun 18 '13 at 20:59
  • @Shaktal can you try removing parts of the expression (to get parts of the match) to see which part breaks the pattern? – Martin Ender Jun 19 '13 at 01:25

2 Answers2

1

Your problem is quite simpler than it looks. This:

const std::regex regexTest( "=== ([^=]+) ===\\n((?:.|\\n)*)\\n=== END \\1 ===" );

worked perfectly on clang++/libc++. It seems that \n does not fit into [] brackets in ECMAscript regexen. Remember to use while regex_search instead of if regex_match if you want to look for more than one instance of the regex inside the string!

Massa
  • 8,647
  • 2
  • 25
  • 26
0

Try to use:

  1. lazy quantifiers:

    === (.+?) ===\\n([\\s\\S]*?)\\n=== END \\1 ===

  2. negative classes and negative lookaheads:

    === ((?:[^ ]+| (?!===))+) ===\\n((?:[^\\n]+|\\n(?!=== END \\1 ===))*)

  3. POSIX:

    === (.+?) ===\n((.|\n)*?)\n=== END [^=]+? ===

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125