3

No matter what URL I specify for curl I always get the same HTML 404 Error page back.

If I use the --verbose option, it looks like curl always connects to the same IP address.

$ curl --verbose http://www.edgeoftheweb.co.uk
* About to connect() to www.edgeoftheweb.co.uk port 80
*   Trying ::ffff:74.117.222.24... connected
* Connected to www.edgeoftheweb.co.uk (::ffff:74.117.222.24) port 80
> GET / HTTP/1.1
> User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
> Host: www.edgeoftheweb.co.uk
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 15 Sep 2011 13:52:07 GMT
< Server: Apache/2.2.3 (CentOS)
< X-Powered-By: PHP/5.2.11
< Content-Length: 519
< Connection: close
< Content-Type: text/html; charset=UTF-8
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
    <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
    <TITLE>www.edgeoftheweb.co.uk</TITLE>
</HEAD>
<FRAMESET rows="100%,*" border="0" frameborder="0" framespacing="0">
    <FRAME name=top src="http://www.searchnut.com/?domain=edgeoftheweb.co.uk&registrar=directnicexpired&aff_txt=This+domain+is+expired%2C+please+renew+it.&aff_url=https%3A%2F%2Fsecure.directnic.com%2Fmyaccount%2Frenewals%2F" noresize>
</FRAMESET>
Closing connection #0

$ curl --verbose http://api.twitter.com
* About to connect() to api.twitter.com port 80
*   Trying ::ffff:74.117.222.24... connected
* Connected to api.twitter.com (::ffff:74.117.222.24) port 80
> GET / HTTP/1.1
> User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
> Host: api.twitter.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 15 Sep 2011 13:53:25 GMT
< Server: Apache/2.2.3 (CentOS)
< X-Powered-By: PHP/5.2.11
< Content-Length: 505
< Connection: close
< Content-Type: text/html; charset=UTF-8
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
    <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
    <TITLE>api.twitter.com</TITLE>
</HEAD>
<FRAMESET rows="100%,*" border="0" frameborder="0" framespacing="0">
    <FRAME name=top src="http://www.searchnut.com/?domain=twitter.com&registrar=directnicexpired&aff_txt=This+domain+is+expired%2C+please+renew+it.&aff_url=https%3A%2F%2Fsecure.directnic.com%2Fmyaccount%2Frenewals%2F" noresize>
</FRAMESET>
Closing connection #0

The output of curl --version is:

curl 7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
Protocols: tftp ftp telnet dict ldap http file https ftps
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz

If I use wget instead, then I retrieve the correct pages back.

Any ideas how to get curl to resolve the URL's correctly? Thanks.

Jon
  • 161
  • 2
  • 3
  • 11
  • Have you tried changing the user-agent of curl to see if they are doing user-agent detection? – polynomial Sep 15 '11 at 14:08
  • Yes, it makes no difference... – Jon Sep 15 '11 at 14:10
  • Just to be sure: could you try to resolve both hosts: `host api.twitter.com` to see the IP? – Matteo Sep 15 '11 at 14:20
  • $ host api.twitter.com api.twitter.com has address 199.59.148.87 api.twitter.com has address 199.59.149.200 api.twitter.com has address 199.59.149.232 api.twitter.com has address 199.59.148.20 – Jon Sep 15 '11 at 14:24
  • $ host www.edgeoftheweb.co.uk www.edgeoftheweb.co.uk has address 91.192.194.242 – Jon Sep 15 '11 at 14:25

2 Answers2

4
* Connected to www.edgeoftheweb.co.uk (::ffff:74.117.222.24) port 80

* Connected to api.twitter.com (::ffff:74.117.222.24) port 80

Seems to me curl is using ipv6 to connect while wget is using ipv4

Try the following

 curl --verbose -4 http://api.twitter.com
Mike
  • 22,310
  • 7
  • 56
  • 79
  • Brilliant! Well spotted, although it had crossed my mind. Do you know of a way of setting the curl default to be IPv4? And likewise for libcurl in php? Is this an OK solution? http://www.businesscorner.co.uk/disable-ipv6-in-curl-and-php/ – Jon Sep 15 '11 at 14:58
  • yes that is what I'd do – Mike Sep 15 '11 at 15:21
0

It is a bit of a long shot, but try changing your name serversto Google's DNS servers temporarily:

8.8.8.8 
8.8.4.4

What it looks like is that libcurl is not able to resolv those DNS names, but your ISP's DNS server is not returning a proper DNS response (NXDOMAIN), but instead returning a search result. I don't know why wget would different significantly in its response, but at least you probably would want you ISP's servers getting in the way of your troubleshooting.

Rilindo
  • 5,078
  • 5
  • 28
  • 46