I want to replace part of the following html text (excerpt of a huge file), to update old forum formatting (resulting from a very bad forum porting job done 2 years ago) to regular phpBB formatting:
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>written by User</i>
this should be filtered into:
[quote=User]
I used the following regex in sed
s/<blockquote.*written by \(.*\)<\/i>/[quote=\1]/g
this works on the given example, but in the actual file, several quotes like this can be in one line. In that case sed is too greedy, and places everything between the first and the last match in the [quote=...] tag. I cannot seem to make it replace every occurance of this pattern in the line... (I don't think there's any nested quotes, but that would make it even more difficult)