Lucas Trzesniewski's comment can actually be used in Python with PyPi regex module (I just replaced named group with a numbered one to make it shorter):
>>> import regex
>>> r = regex.compile(r'({(?:[^{}]++|\g<1>)*})(*SKIP)(*FAIL)|\s*,\s*')
>>> s = """{J. Doe, R. Starr}, {Lorem
{i}psum dolor }, Dol. sit., am. et."""
>>> print(r.split(s))
['{J. Doe, R. Starr}', None, '{Lorem\n{i}psum dolor }', None, 'Dol. sit.', None, 'am. et.']
The pattern - ({(?:[^{}]++|\g<1>)*})(*SKIP)(*FAIL)
- matches {...{...{}...}...}
like structures (as {
matches {
, (?:[^{}]++|\g<1>)*
matches 0+ occurrences of 2 alternatives: 1) any 1+ characters other than {
and }
(the [^{}]++
), 2) text matching the whole ({(?:[^{}]++|\g<1>)*})
subpattern). The (*SKIP)(*FAIL)
verbs make the engine omit the whole matched value from the match buffer, thus, moving the index to the end of the match and holding nothing to return (we "skip" what we matched).
The \s*,\s*
matches a comma enclosed with 0+ whitespaces.
The None
values appear because there is a capture group in the first branch that is empty when the second branch matches. We need to use a capture group in the first alternative branch for recursion. To remove the empty elements, use comprehension:
>>> print([x for x in r.split(s) if x])
['{J. Doe, R. Starr}', '{Lorem\n{i}psum dolor }', 'Dol. sit.', 'am. et.']