re Matching { and {{

Question

For reasons beyond the scope here, I'm building a simple bibtex parser. Some bibtex fields are delimited by a single curly brace, while others are delimited by double curly braces. Curly braces are also valid content for the field.

I have a string that I know corresponds to a single field, in the formats:

fieldName1 = {{ content }},\n    -> content
fieldName2 = { content },\n      -> content
fieldName3 = { {[}content,] },\n -> {[}content,]

With this pattern I can recover the content:

re.compile(r"(?P<name>[\w-]+?)[\s]*=[\s]*({(?P<content>.*)})",    flags=re.IGNORECASE|re.DOTALL)

But it will contain { and } if that field uses double braces.

Is there an easier way to remove them than to test [0]=='{' and [-1]=='}'

So `fieldName = {}},\n` would be valid too, with `fieldName` being `}`? — Eric Duminil, Mar 30 '19 at 22:35
`Curly braces are also valid content for the field.` then in what way one could distinguish a brace as a delimiter from a brace as a content? Are `,\n` chars always following? — revo, Mar 30 '19 at 22:36
You could put them into none-capturing capture groups like so, (?P[\w-]+?)[\s]*=[\s]*((?:{{)(?P.*)(?:}})). you can see it working here https://regex101.com/r/k4adk2/12 — Sirsmorgasboard, Mar 30 '19 at 22:37
Rules sound weird to me. Curly braces should be escaped when used as content. `{{a}}` could mean `{a}` or `a` otherwise. — Eric Duminil, Mar 30 '19 at 22:40
@EricDuminil with content being } yes, it would. Maybe bibtex complains about that, but for me, for this, I don't see an issue. — Fábio Dias, Mar 30 '19 at 22:41
@Sirsmorgasboard Almost that, I want a way to match both { and {{. But thanks for the website link! It will help along the rest of the thing :) — Fábio Dias, Mar 30 '19 at 22:43
Ah sorry I misunderstood, something more like this then? (?P[\w-]+?)[\s]*=[\s]*({ content .*}|{{ content.*}}) https://regex101.com/r/k4adk2/14 — Sirsmorgasboard, Mar 30 '19 at 22:52
@Sirsmorgasboard I tried that too, but then I can't use named groups. It can still work, but one would have to test for None in one group then use the other, which is not really advantageous versus testing { and }. Works, but I'm curious if there is a more elegant solution. — Fábio Dias, Mar 30 '19 at 23:11
Sorry I guess I am still not understanding, could you give your desired output for each of your 3 examples please? — Sirsmorgasboard, Apr 01 '19 at 22:12
@sirsmorgasboard Sorry, that was indeed unclear. I updated the question to add the desired results. In other words, remove doubly braces only if we can find them on both sides. — Fábio Dias, Apr 01 '19 at 23:03

Valdi_Bo · Answer 1 · 2019-04-01T05:09:25.953

1

Try the following regex:

(?P<name>[\w-]+?)\s*=\s*{(?:{| {\[})?\s*(?P<content>.*?)(?:,])?\s*}{1,2}

In my test it matches all 3 your samples.

For a working example (containing test of the regex above) see https://regex101.com/r/Gy8IWu/1

The above regex test site provides detailed explanations about particular parts of the regex under test and what has been matched.

Edit

The regex matching all 3 variants, according to your comment, is:

(?P<name>[\w-]+?)\s*=\s*{{1,2}\s*(?P<content>(?:{\[})?.*?)\s*}{1,2}

See the updated example: https://regex101.com/r/Gy8IWu/2

edited Apr 01 '19 at 05:09

answered Mar 31 '19 at 05:24

Valdi_Bo

30,023
4
23
41

Not really, for the third example, the content should be "{[}content,]" – Fábio Dias Mar 31 '19 at 20:41

re Matching { and {{

1 Answers1

Edit