0

I'm using LWP::UserAgent to request a lot of page content. I already know the ip of the urls I am requesting so I'd like to be able to specify the ip address of where the url I am requesting is hosted, so that LWP does not have to spend time doing a dns lookup. I've looked through the documentation but haven't found any solutions. Does anyone know of a way to do this? Thanks!

srchulo
  • 5,143
  • 4
  • 43
  • 72
  • maybe $ua->get('http://IP_ADDRESS/REST_OF_URL'); ? – snoofkin Sep 14 '12 at 23:54
  • I don't think that will work, because some hosting is dependent upon the domain name used in the request, and wouldn't know where to direct the request if that server is being used for hosting more than one domain. – srchulo Sep 14 '12 at 23:56
  • Hmmm...but you say "I already know the IP of the urls I am requesting"...and @soulSurfer2010 and I are simply saying embed the IP address itself in your URL. Are you overthinking this or are we not grasping your problem? – DavidRR Sep 15 '12 at 00:03
  • 2
    @DavidRR ... which results in the client sending a *different request* that doesn't have an appropriate `Host` header, which confuses the server and gets a bad result. – hobbs Sep 15 '12 at 00:05
  • 2
    @DavidRR hobbs is right. This type of request will not always work on a server. It could host multiple sites behind the same ip and just return an error. – srchulo Sep 15 '12 at 00:14

3 Answers3

7

So I found a module that does exactly what I'm looking for: LWP::UserAgent::DNS::Hosts

Here is an example script that I tested and does what I specified in my question:

#!/usr/bin/perl 
use strict;
use LWP::UserAgent;
use LWP::UserAgent::DNS::Hosts;

LWP::UserAgent::DNS::Hosts->register_host(
        'www.cpan.org' => '199.15.176.140',
);

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;

#actually enforces new DNS settings as if they were in /etc/hosts
LWP::UserAgent::DNS::Hosts->enable_override;

my $response = $ua->get('http://www.cpan.org/');

if ($response->is_success) {
    print $response->decoded_content;  # or whatever
}
else {
    die $response->status_line;
}
srchulo
  • 5,143
  • 4
  • 43
  • 72
  • 2
    **Cool! Suggested improvement**: do the host lookup dynamically before you invoke `register_host`. (As an alternative to hard-coding the IP address in `'www.cpan.org' => '199.15.176.140'`. – DavidRR Sep 15 '12 at 01:17
6

Hum, your system should already be caching DNS responses. Are you sure this optimisation would help?


Option 1.

Use

http://192.0.43.10/

instead of

http://www.example.org/

Of course, that will fail if the server does name-based virtual hosting.


Option 2.

Replace Socket::inet_aton (called from IO::Socket::INET called from LWP::Protocol::http) with a caching version.

use Socket qw( );
BEGIN {
    my $original = \&Socket::inet_aton;

    my %cache;
    my $caching = sub {
       return $cache{$_[0]} //= $original->($_[0]);
    };

    no warnings 'redefine';
    *Socket::inet_aton = $caching;
}
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • so where would I put this code? And could you explain to me what it's doing? – srchulo Sep 15 '12 at 00:17
  • Early in your program. It has to be before anything does `use IO::Socket::INET;`, so it has to be before anything does `use LWP;`. (You can make sure it's getting called by adding a print statement.) It replaces `Socket::inet_aton` with your own version that caches. `inet_aton` is used to resolve domain names (and to pack IP addresses). – ikegami Sep 15 '12 at 00:24
  • Is it accurate to say then, that your solution ultimately has the effect of instructing the client to set the proper value in the HTTP Host header in its request? e.g., `Host: stackoverflow.com` – DavidRR Sep 15 '12 at 00:36
  • 2
    @DavidRR, The second option doesn't change the LWP request at all. It just replaces the name resolver with one that always returns the same answer as before (assuming the DNS response is static over the life of the program), just without using an OS call some of the time. – ikegami Sep 15 '12 at 01:29
-3

Simply replace the domain name with the IP address in your URL:

use strict;
require LWP::UserAgent;

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;

# my $response = $ua->get('http://stackoverflow.com/');
my $response = $ua->get('http://64.34.119.12/');

if ($response->is_success) {
    print $response->decoded_content;  # or whatever
}
else {
    die $response->status_line;
}
DavidRR
  • 18,291
  • 25
  • 109
  • 191
  • 1
    This won't send the same request to the server. – hobbs Sep 15 '12 at 00:04
  • @hobbs: `nslookup stackoverflow.com` => `Name: stackoverflow.com`, `Address: 64.34.119.12`. Or, are you suggesting as @ikegami does: "Of course, that will fail if the server does name-based virtual hosting." But that is ***if***. – DavidRR Sep 15 '12 at 00:10
  • 1
    I didn't say it would send the request to the wrong place, I said it wouldn't send the *same request*. And it won't, because it won't contain `Host: stackoverflow.com`. – hobbs Sep 15 '12 at 00:11
  • 2
    Even StackOverflow doesn't send the same response to the two requests; in one case it responds a webpage; in the other it returns a 302 `Location: http://stackoverflow.com/` which simply tells the client to make the request that it should have made in the first place (but now with an extra round-trip). – hobbs Sep 15 '12 at 00:13
  • I find it hard to believe that a request directly via an IP address redirects to a DNS address. Is this really happening with SO? – DavidRR Sep 15 '12 at 00:13
  • 1
    It is more often than not, so it's fair to guess "yes". – hobbs Sep 15 '12 at 00:14
  • 1
    @DavidRR, Named-based virtual hosting is quite common. It is quite likely a problem, but not necessarily. It's definitely worthy of a note explaining it could be a problem. hobbs was simply providing that note since you didn't. – ikegami Sep 15 '12 at 00:15
  • 2
    lol, for SO, the optimisation would "double" the time needed to satisfy the request! – ikegami Sep 15 '12 at 00:17
  • Points well taken and thanks for educating me. The request headers will be different and a redirect is also possible. So, the poster must consider whether either has an undesirable impact on his particular use case. – DavidRR Sep 15 '12 at 00:18