0

I'm setting up Apache2 as a reverse proxy for a remote site. Let's assume the remote site is http://app.remotesite.com. Here is a snippet from my virtual host config:

ProxyPass /pxy/ http://app.remotesite.com/

So this should take a request like http://app.mysite.com/pxy/search?q=abc, and pass it through as http://app.remotesite.com/search?q=abc.

I am getting a "Bad request" when I try this. Based on the output in /var/log/apache2/error.log, it is doing the proxy correctly, but it looks like when it connects to the remote site it does so with its IP address. If I take that IP address (printed in error.log) and do a request with it, e.g. http://[IP address]/search?q=abc, I get the same "Bad request" error. My hypothesis is that the remote site is relying on the hostname to serve the request properly, but mod_proxy is not sending it over. I know about the ProxyPreserveHost setting, but this is for preserving the original hostname in the proxy request (in this case, app.mysite.com) which is not what I want.

Can anyone suggest a way for me to force mod_proxy to use the remote site's hostname in its request? Or, if my hypothesis does not make sense, point out what else might be going wrong?

jfrank
  • 168
  • 1
  • 1
  • 8
  • Please include the relevant parts of the access and error logs. – adaptr Apr 25 '12 at 15:46
  • Dumb mistake: I had ProxyPreserveHost On. The remote request was therefore going through with a hostname of app.mysite.com. Thanks to larsks for showing me how to use tcpdump, which enabled me to diagnose the problem. – jfrank Apr 25 '12 at 16:17

1 Answers1

1

Your hypothesis is probably incorrect. mod_proxy connects using the hostname you provide in the proxy URL.

If you request http://app.remotesite.com/search?q=abc on the command line using curl, do you get the response you expect? If so, then a good place to start is looking at the difference between the request that curl produces vs. the request that mod_proxy is sending over.

To see what curl is doing, you can use the --trace-ascii <file> option, like this:

curl --trace-ascii trace.out http://app.remotesite.com/search?q=abc

This will produce output in trace.out that looks something like:

== Info: About to connect() to google.com port 80 (#0)
== Info:   Trying 74.125.228.8... == Info: connected
== Info: Connected to google.com (74.125.228.8) port 80 (#0)
=> Send header, 165 bytes (0xa5)
0000: GET / HTTP/1.1
0010: User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7
0050:  NSS/3.13.1.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2
0084: Host: google.com
0096: Accept: */*
00a3: 

Getting the same information out of Apache is a little trickier; I would use tcpdump, which is a packet capturing tool. Start capturing packets like this:

tcpdump -w packets -s 1500 port 80 and host app.remotesite.com

While tcpdump is running, make your request from a browser (or curl, or whatever), stop the tcpdump with ^C, and then examine the file like this:

strings packets

Which will get you something like:

{GET / HTTP/1.1
User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.13.1.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2
Host: google.com
Accept: */*

This will show the URL being requested, the Host: header, and other useful information. See how it looks, and come back here if you don't spot something obvious.

larsks
  • 43,623
  • 14
  • 121
  • 180
  • 1
    Using curl does work, if I use the hostname, but not when I use the IP address. That is how I came to my hypothesis, since the response with curl using the IP address is the same "400" page that I get through proxying. I tried your suggestion of tcpdump (had to install binutils first) but there was no traffic reported to app.remotesite.com. So my next move was to remove the "and host app.remotesite.com" from the tcpdump line, and this allowed me to figure out what was going wrong. As part of a different test, I had ProxyPreseverHost On, so the wrong host name was being sent. oooops – jfrank Apr 25 '12 at 16:14
  • Glad it helped! – larsks Apr 25 '12 at 16:16