For the development of a custom reverse proxy (written in C++) I want to do a realtime translation of URIs in HTML content. For example if I want to access a ressource on http://myserver/
using http://my-reverse-proxy/myserver
, all absolute and toplevel links like http://myserver/somecontent1.ext
or /somecontent2.ext
need to be modified.
An HTML tag
<img src="/sample.png">
would therefore be translated to
<img src="/myserver/sample.png">
From my point of view there are to approaches:
1) Using regular expressions and string replacement to find all related HTML tags and their paths using capture groups and do some string replacement.
2) Parse entire HTML content, do some transformation on the parse tree and pretty-print the result back to a valid HTML ressource.
And this is what this question is all about: Do you have any experiences what solution might be faster and maybe even more reasonable? Do you know a framework I might use to not reinvent the wheel? As this process should be used later for CSS and XML-based ressources as well, it should not be a HTML-depend solution.
Thanks in advance!