0

I’ve got a virtual host in my Apache config to deal with ads, spam, and malware sites. It works by having bad servers redirect to a specific loopback address that is mapped to the virtual host via the HOSTS file.

Using the following directives, I have been able to replace any pages from bad servers with something like [ad] and any graphics from bad servers with a local 1x1, transparent PNG file.

RewriteRule \.(gif|jpg|png|jpeg)$ /1x1-trans.png
ErrorDocument 404 "<p>[ad]</p>"

However recently, I have seen pages with broken IMG tags because they use a SRC without a file extension.

<img src="http://badserver.com/adsandjunk/foobar;tile=4;sz=575x90;othervariables=stuff?">

I tried using

RewriteRule ^.*$ "<p>ad</p>" [L]

But that gives the broken image placeholders again. Using this

RewriteRule ^.*$ /1x1-trans.png [L]

Fixes the images, but then any non-images (like pages, frames, etc.) pop up a Save As dialog for the PNG.

How can I get Apache to replace graphics (ie any IMG tag) with a graphic and everything else with a bit of HTML?

Thanks a lot.

Synetech
  • 948
  • 1
  • 12
  • 27
  • I can't suggest a solution, but I have reason to believe you're on the wrong track with mod rewrite. As I understand it, rewrite rules are for modifying the URLs of incoming requests, not replacing chunks of text within a web page, which is what I think is what you are trying to do here. – Bart B Jul 30 '09 at 16:39
  • Bad servers are redirected via the HOSTS file. I’ve now added that to the question for clarity. – Synetech Jul 30 '09 at 17:19

2 Answers2

1

Just out of curiosity, are you using Apache as a reverse proxy here? That's the only context I can understand you having a virtual host to "deal with ads, spam, and malware sites."

I'm not sure this is a mod_rewrite issue. You may be better off using filtering:

http://httpd.apache.org/docs/2.0/mod/mod_ext_filter.html

Particularly the section where they use sed to replace text. You can use just about anything really, perl, etc.

I have not done this myself, but the sed route looks promising if you can cobble together the specific search and replace criteria.

Corey S.
  • 2,487
  • 1
  • 19
  • 23
  • A proxy? No, but I have the bad servers redirected back to the virtual host’s loopback IP via the HOSTS file. – Synetech Jul 30 '09 at 17:20
0

You're doing this the hard way. Just use Privoxy.

200_success
  • 4,771
  • 1
  • 25
  • 42
  • It’s kind of a legacy system because I just wanted to take advantage of the existing entries in the HOSTS file. (And not that hard, especially since it has worked flawlessly until recently.) I have looked at (or rather passed by) Privoxy several times but not really tried it out yet. I will now. :) – Synetech Jul 30 '09 at 21:16
  • Oh, and I like the HOSTS method because it works for more than just web browsing. (I don’t think that apps or malware that phone home allow you to configure a proxy for the connection.) – Synetech Jul 30 '09 at 21:21