Problem
The problem description is simple; I have a pile of text files, from which I wish to extract the frontmatter (described anon) alone, if it's there are all, and then stop processing the file any further.
Here's a sample valid example of a file with frontmatter; my comments (assume invisible from the file) will be in c-style comments:
/*spaces & newlines are fine*/
--- /* i.e., /^---\s*$/ */
key: value
foo: bar, zip, grump
/*
Anything can go in here, once I have this section pulled out, the yaml schema
can do the reset. All that's important to note is that this section must be
terminated explicitly with a subsequent /^---\s*$/ in order to be deemed valid.
---
Anything else can follow here, more accidental frontmatter blobs can exist,
but it should not matter since the other requirement is that the regex engine
will cease processing beyond the termination of the first match.
What I have so far, which doesn't address certain edge-cases is, using ripgrep
/rg
:
rg -g '!**/{node_modules,.*}/*' -g '*.md' -U '(?s)\s*^---$((?!---).*)^---$' -r '$1'
Problem with above right now is that it matches far past the first terminating ---
in certain cases, for example where you have two frontmatter blobs, one after another.
Bonus Problem
- I want to know how I can do this with the standard regex engine that
rg
defaults to, but also how to do this withPCRE2
(-P
) - I want to know how I can have all flags embedded in the regex itself, rather than have
-U
for multiline, using(?m)
for example