2

These 2 requests should have the same result, but the first one returns a 200 (OK) and the second one returns a 404 (Not Found). Why is that?

require 'net/http'

url = "http://readwrite.com/2013/12/04/google-compute-engine"
uri = URI(url)
Net::HTTP.get_response(uri)
#=> #<Net::HTTPOK 200 OK readbody=true>
Net::HTTP.new(uri.host).request(Net::HTTP::Get.new(url))
#=> #<Net::HTTPNotFound 404 Not Found readbody=true>

It happens only with some urls. I couldn't figure out the pattern. Here's another example: http://davidduchemin.com/2014/01/towards-mastery-again/.

Andrew Marshall
  • 95,083
  • 20
  • 220
  • 214
sebagon
  • 21
  • 1
  • 2

1 Answers1

4

First, let’s compare the two by viewing their actual HTTP requests with tcpdump so we can get an idea for what may be happening:

tcpdump -vvASs 0 port 80 and host www.readwrite.com
# Net::HTTP.get_response(uri)

GET /2013/12/04/google-compute-engine HTTP/1.1
Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3
Accept: */*
User-Agent: Ruby
Host: readwrite.com
# Net::HTTP.new(uri.host).request(Net::HTTP::Get.new(url))

GET http://readwrite.com/2013/12/04/google-compute-engine HTTP/1.1
Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3
Accept: */*
User-Agent: Ruby
Connection: close
Host: readwrite.com

We can see that the second request is incorrectly requesting the full URL (with hostname) as the path. This is because you’re passing url to Net::HTTP::Get.new which causes Net::HTTP::Get.new(url).path to be just what we see above: the full URL with hostname. Instead pass the URI instance (uri) to Net::HTTP::Get.new:

Net::HTTP.new(uri.host).request(Net::HTTP::Get.new(uri))
#=> #<Net::HTTPOK 200 OK readbody=true>

And its tcpdump is now effectively the same as the first’s:

GET /2013/12/04/google-compute-engine HTTP/1.1
Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3
Accept: */*
User-Agent: Ruby
Host: readwrite.com
Connection: close
Andrew Marshall
  • 95,083
  • 20
  • 220
  • 214
  • I had tried that but it returns NoMethodError: undefined method `empty?' for #. Nevertheless doing Net::HTTP.new(uri.host).request(Net::HTTP::Get.new(uri.path)) works perfectly. Thanks! – sebagon Jan 22 '14 at 11:35
  • Also, thanks for suggesting using tcpdump to debug it. It helped a lot in other problems too. – sebagon Jan 30 '14 at 17:59