There are at least four problems with your expression.
First, you're capturing everything from <xml>
to </xml>
in one big group. This means that if you manage to exclude the FType bits, you'll get nothing at all; if you don't, you'll get everything. If you create three separate groups, and make the middle one non-capturing, that will let you exclude the middle one.
Second, you're trying to exclude everything from <FType>
to <FType>
, which isn't going to work. The closing tag is </FType>
.
Third, you're using greedy matches everywhere, so even if you get the first two right, you're going to match everything up to the last FType, including any earlier FTypes.
Putting it all together:
>>> re.match(r'(?P<xml>.*?)(?:<FType>.*</FType>)(.*)', s, re.DOTALL).groups()
('<xml>\n<EType>\n<E></E>\n<F></F>\n', '\n<G></G>\n</EType>\n</xml>\n')
If you ''.join
that together, or sub
it to r'\1\2'
, etc., you'll get the desired output.
Fourth, this is, of course, horribly brittle. But parsing a non-regular language like XML with regexps is guaranteed to be horribly brittle (or very complex and sometimes exponentially slow), which is why you shouldn't do it. But that's what you asked for.
And if you're trying to use this with a function that doesn't take regexp patterns, or one that takes a different regexp syntax than Python's, this probably isn't going to help you very much.