1

How can I use Ruby Webrick to do html content modification as it passes through a proxy server?

require 'webrick'
require 'webrick/httpproxy'

handler = proc do |req, res|
  # if the_site_url == "youtube.com"
  #    html_of_the_page = "<body>Custom Html<body>"
  # end
end

proxy = WEBrick::HTTPProxyServer.new(
  Port: 8080, 
  ProxyContentHandler: handler
)

trap 'INT'  do proxy.shutdown end
server.start

This question is similar but its solution does not work. If Webrick does not support content-altering functionality, is there another proxy server library that does?

Update

Ideally, I should be able to modify existing HTML. I would think there is some other variable like res.body in a proxy handler that represents the html of the page: writable, parsable, readable (whether that is a stream of data or the full data).

springworks00
  • 104
  • 15
  • check this https://github.com/bbtfr/evil-proxy you may copy their code or use it instead of WEBrick::HTTPProxyServer that is the parent class of the project – Giuseppe Schembri Jul 22 '20 at 19:29
  • It says that `res.body` is writable, but if I modify it the web page doesn't reflect the changes. Do you know how I could change the actual on-screen content? – springworks00 Jul 23 '20 at 16:11
  • If you reassign a `String` or a `String.IO` to res.body (res.body = 'Hello') that will be your body but this is not what you asked. Sorry but I could not find a way (res.body returns @body that is a lambda defined # in my environment) I tried to use some function composition trick but no result, I hope somebody could answer this question. I found nothing on the WEB. – Giuseppe Schembri Jul 23 '20 at 16:24
  • I did not notice the question was also: is there another proxy server library that does? You could try https://github.com/igrigorik/em-proxy. – Giuseppe Schembri Jul 23 '20 at 16:50
  • I think I found a way to do it, I hope it works for you. Let me know how it goes. – Giuseppe Schembri Jul 24 '20 at 16:34

1 Answers1

0

If you just need to create a custom html

require 'webrick'
require 'webrick/httpproxy'
# require 'uri' no needed

handler = proc do |req, res|
  content_body = <<-HTML
  <body>
    Custom Html
  <body>
  HTML

  res.header['content-length'] = content_body.size.to_s

  res.body = content_body if req.request_uri.host =~ /youtube/
end

proxy = WEBrick::HTTPProxyServer.new Port: 8080, ProxyContentHandler: handler

trap 'INT'  do proxy.shutdown end
trap 'TERM' do proxy.shutdown end

proxy.start

If you want to put a message before and after the original HTML (I do not know if it could be possible to actually parse and change the original HTML content):

#!/usr/bin/env ruby

require 'webrick'
require 'webrick/httpproxy'

handler = proc do |req, res|
  content_after = "CA "
  content_before = " CB"
  add_to_content_length = "#{content_after}#{content_before}".size.to_i

  res.header['content-length'] = (res.header['content-length'].to_i + add_to_content_length).to_s

  compose = lambda do |add_first, original_body|
    proc do |sock|
      original_body.(add_first.(sock))
      sock.write content_after
    end
  end

  add_before = lambda { |sock| sock.write content_before; sock}

  res.body = compose.(add_before,res.body)
end

proxy = WEBrick::HTTPProxyServer.new Port: 8080, ProxyContentHandler: handler

trap 'INT'  do proxy.shutdown end
trap 'TERM' do proxy.shutdown end

proxy.start
  • The first solution causes an ERR_TUNNEL_CONNECTION_FAILED in chrome, I think because the headers are being modified in non-normal ways. The second has `res.body` as a Proc instead of a String (which I think is a difference in our environments). I also started a bounty so I can ideally reward all the time you've put into this! Thank you so much! – springworks00 Jul 24 '20 at 19:36
  • Sorry it worked with `curl` I did not tried it with a browser, if you nee check on git hub https://github.com/muffatruffa/ProxyServer there is a file with the client calls and the printed output as well (ruby version 2.6.3 btw). – Giuseppe Schembri Jul 24 '20 at 21:21