Ok, now the question is open again, I can do it as an answer, so...
Unless I'm missing something, and once it's just <br/>
(not any variants), then can just replace <(?!br/>)
with <
and (?<!<br/)>
with >
and that's it?
In Python, it looks like that means this:
text = re.sub( '<(?!br/>)' , '<' , text )
text = re.sub( '(?<!<br/)>' , '>' , text )
To explain what's going on, (?!
...)
is a negative lookahead - it only successfully matches at a position if the following text does not match the sub-expression it contains.
(Note lookaheads do not consume the text matched by their sub-expression, they only verify if it exists, or not.)
Similarly, (?<!
...)
is a negative lookbehind, and does the same thing but using the preceding text.
However, lookbehinds do have a slight different to lookaheads (in some regex implementations) - which is that the sub-expressions inside lookbehinds must represent fixed-width or limited-width matches.
Python is one of the ones that requires a fixed width - so whilst the above expression works (because it's always four characters), if it was (?<!<br\s*/?)>
then it would not be a valid regex for Python because it represents a variable length match. (However, you can stack multiple lookbehinds, so you could potentially manually iterate the assorted options, if that was necessary.)
` (not any variants), then can just replace `<(?!br/>)` with `<` and `(?<!
` with `>` and that's it? – Peter Boughton Sep 04 '11 at 20:04