16

Is it possible to replace content on every page passed through a proxy similar to how mod_rewrite is used for URLs? The documentation on substitute is not clear.

I have some pages I am reverse proxying that have absolute paths. This breaks the site. They need replacing and tools like mod_rewrite are not picking them up as they are not URL requests.

<VirtualHost *:80>
    ServerName  servername1
    ServerAlias servername2

    ErrorLog "/var/log/proxy/jpuat_prox_error_log"
    CustomLog "/var/log/proxy/jpuat_prox_access_log" common

    RewriteEngine on
    LogLevel alert rewrite:trace2
    RewriteCond %{HTTP_HOST} /uat.site.co.jp$ [NC]
    RewriteRule ^(.*)$ http://jp.uat.site2uk.co.uk/$1 [P]

    AddOutputFilterByType SUBSTITUTE text/html
    Substitute "s|uat.site.co.jp|jp.uat.site2uk.co.uk|i"


    ProxyRequests Off

    <Proxy *>
            Order deny,allow
            Allow from all
    </Proxy>

    ProxyPass / http://uat.site.co.jp/
    ProxyPassReverse / http://uat.site.co.jp/
</VirtualHost>

Neither of the above works at replacing the HTML string

<link href="//uat.site.co.jp/css/css.css

with

<link href="//uat.site2uk.co.uk/css/css.css

Conf after changes:

<VirtualHost *:80>
    ServerName  jp.uat.site2uk.co.uk
    ServerAlias uat.site.co.jp
    ErrorLog "/var/log/proxy/jpuat_prox_error_log"
    CustomLog "/var/log/proxy/jpuat_prox_access_log" common
    ProxyRequests Off
    <Proxy *>
        Order deny,allow
        Allow from all
    </Proxy>
    ProxyPass / http://uat.site.co.jp/
    ProxyPassReverse / http://uat.site.co.jp/
    AddOutputFilterByType SUBSTITUTE text/html
    Substitute "s|uat.site.co.jp|jp.uat.site2uk.co.uk|ni"
</VirtualHost>
ZZ9
  • 888
  • 3
  • 16
  • 47
  • I'm confused. That looks like it's from an HTML `a` tag. Clicking on that link likely won't result in the web browser following the link, but rather a file browser (Windows Explorer) trying to open the UNC. Are you trying to replace that string in HTML text? – GregL Apr 21 '15 at 14:21
  • They site works correctly. However once we put it behind a firewall we ofcourse get 404s on a bunch of css and images. Normally everything gets 200 – ZZ9 Apr 21 '15 at 14:25
  • They are from link tags on an IIS server – ZZ9 Apr 21 '15 at 14:29
  • I don't think you can provide UNC paths in `link` tags. If you can, I can't say it would be a good idea.. In any event, that's not your question. According to the Apache [docs](http://httpd.apache.org/docs/2.4/mod/mod_substitute.html), the `substitute` directive is only valid inside `Directory` blocks or `.htaccess` files. Try creating a `` block (even if it's for /) and put the directive in there. – GregL Apr 21 '15 at 14:34
  • Hasnt done the trick AddOutputFilterByType SUBSTITUTE text/html Substitute "s|uat.site.co.jp|jp.uat.site2uk.co.uk|i" – ZZ9 Apr 21 '15 at 15:20
  • Try a `location` block instead, or read about their differences and use whichever one is better. – GregL Apr 21 '15 at 16:08
  • 2
    @GregL, this format of URL is a "protocol-relative" URL, it is perfectly valid way to link to pages, although it is not that commonly known. "//domain.com/path" makes the browser request the document with the same protocol that was used to request the page containing the link. – Tero Kilkanen Apr 21 '15 at 16:21

3 Answers3

16

There's an apache module called mod_substitute that can do this. Here's a short example:

<Location "/">
    AddOutputFilterByType SUBSTITUTE text/html
    Substitute "s/uat.site.co.jp/jp.uat.site2uk.co.uk/ni"
</Location>

Or, when combined with mod_proxy:

ProxyPass / http://uat.site.co.jp/
ProxyPassReverse / http://uat.site.co.jp/

Substitute "s|http://uat.site.co.jp/|http://jp.uat.site2uk.co.uk/|i"

There's more information at the Apache documentation for mod_substitute.

Jenny D
  • 27,780
  • 21
  • 75
  • 114
  • Hi, thanks for the suggestion, unfortunately I have not had much luck down this path. I have tested it outside of the proxy successfully though. It appears mod_proxy ignores it. – ZZ9 Apr 21 '15 at 16:22
  • I added some more info which may be helpful. – Jenny D Apr 21 '15 at 16:29
  • 1
    Thanks a lot, this works. Turned out to be a glitch with Apache picking up backups of my files in /etc/httpd/conf.d/ that didn't end in .conf (vhost.bak). – ZZ9 Apr 22 '15 at 17:02
9

If you haven't restarted Apache, be sure to do that, but if you've already done so, you could try a global output filter that runs a custom PHP script to do your replacing just to see if that solves it for some reason.

EDIT: based on your comment, it could be that substitute isn't working because the content is compressed. To turn off compression, add these lines to your VirtualHost:

RequestHeader unset Accept-Encoding
RequestHeader set Accept-Encoding identity

If that doesn't work, try the following:

Add these to your conf, updating the paths of course:

#add this outside of any VirtualHost tags
ExtFilterDefine proxiedcontentfilter mode=output cmd="/usr/bin/php /var/www/proxyfilter.php"

#add these in your VirtualHost tag
RequestHeader unset Accept-Encoding 
RequestHeader set Accept-Encoding identity
SetOutputFilter proxiedcontentfilter

In proxyfilter.php have some code like the following:

#!/usr/bin/php
<?php
$html = file_get_contents('php://stdin');
$html = str_ireplace('uat.site.co.jp', 'jp.uat.site2uk.co.uk', $html);
file_put_contents('php://stdout', $html);

If this works, then narrow the focus of this to just text/html content as you have in your example.

g491
  • 973
  • 5
  • 7
  • I get a HTML 200 on the page but the browser shows: Content Encoding Error The page you are trying to view cannot be shown because it uses an invalid or unsupported form of compression. – ZZ9 Apr 22 '15 at 11:13
  • Ah, add these to your VirtualHost. RequestHeader unset Accept-Encoding and also RequestHeader set Accept-Encoding identity – g491 Apr 22 '15 at 14:27
  • I updated my answer with something to try to get your original substitute line working. I'd recommend trying that first as it's simpler to try and may be what's going on. – g491 Apr 22 '15 at 15:18
  • Update for a great answer but I got the other answer working first – ZZ9 Apr 22 '15 at 16:59
  • 1
    In my case, it was the compression, nailed it. It was driving me crazy... thank you so much! – That Brazilian Guy Jan 27 '17 at 15:16
  • Alternatively, you can decompressing incoming content before substituting, and then compress again after, using just `SetOutputFilter INFLATE;DEFLATE` – lorenzo-s Sep 28 '21 at 08:54
1

According to https://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypassreverse which rewrites the headers, you use "

To rewrite HTML content to match the proxy, you must load and enable mod_proxy_html.

eckes
  • 845
  • 9
  • 21