2

I want to extract multiple GLSL sources from a single file, separated with a header. I wrote this small regex, to do this for me:

(?:\n|^)-- (\w*)\.?(\d\d\d)?\.(\w\w\w?)\r?\n([\s\S\r\n]*?)(?=\n--|$)

Runs on a source like this:

-- passthrough.VS
in vec4 position;

void main(){
   gl_Position = position;
}

-- mvp.VS
layout (location=0) in vec3 position; 

uniform mat4 model;
#include "engine/shaders/vp_include.glsl"

void main () {
   gl_Position = proj * view * model * vec4 (position, 1.0); 
}

The capture group ([\s\S\r\n]*?) is supposed to match the body of the shader. I included \r\n because of Regex Working on regexr but not Visual Studio.

The expected output (and the code to run) is here: http://coliru.stacked-crooked.com/a/a890795f0c438a0b, compiled with gcc (regex101.com's engine also gives the expected output).

My problem is with Visual Studio 2015, where this last capture simply matches an empty string (the other captures work).

Am I missing something? Is this a bug in the VS regex implementation?

Community
  • 1
  • 1
martty
  • 53
  • 5
  • You probably need to escape the backslash on things C thinks are special. So a regex with `\r\n` put into a C string would be `x = "\\r\\n"` – Jerry Jeremiah Nov 27 '15 at 00:09
  • @JerryJeremiah The raw string literal `R"(...)"` allows you to write the string without having to escape anything. – melak47 Nov 27 '15 at 00:18

1 Answers1

2

For some reason, ^ and $ behave in the multi-line manner in VS' regex implementation, where they match the beginning/end of any line, not the entire string.

Your lazy capture group followed by (?=\n--|$) will then quit as early as possible, which happens to be the closest newline.

The regex in your code example is slightly different from the one in your question. You don't match a newline after your "-- header" line, so your last capture group matches an empty string between the consumed header and the newline.

If you use the regex from your question, the last capture group will instead match the first line after the "-- header".

I'm not sure why it does this, but it seems the TR1 implementation behaved the same way (though the boost \A and \z anchors mentioned there are no longer available).

melak47
  • 4,772
  • 2
  • 26
  • 37
  • 1
    Thanks for the quick answer! I guess this relates to: http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#2343 and http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#2503. This really makes unusable across compilers. Is there a way to modify my regex to work on both? (and do what I want it to do) – martty Nov 27 '15 at 08:37
  • The easiest way I see is to insert an end of file marker manually, like `-- end`, and use just `(?=\n--)` as the lookaheak. Alternatively, use the regex only to find your markers, then split the string between the end and start of each marker to obtain the bodies: [example](http://coliru.stacked-crooked.com/a/0c9fb22ee9c74c1a) – melak47 Nov 27 '15 at 22:58