1

Here is the little script I created to do HTTP transactions from a file with a list of URLs. The problem is that it's not able to do HTTP transactions as fast as I would like. More precisely, I set the rate to be 200/seconds, but it was able to send only at about 50/second. The server is powerful enough to handle 100/second.

This was run on a powerful PC with E5-1650 CPU and 64GB of RAM running Ubuntu 14.04 desktop. When the script runs, the CPU usage is only about 12%. The command I used was perl httpStresser.pl urlList rate 200.

Any idea why?

use AnyEvent;
use EV;
use AnyEvent::HTTP;
use AnyEvent::Handle;
use Time::HiRes qw( gettimeofday );
my $expectedRespCode = 200;
my $rate = 1;
my @urls = ();
readUrls(shift);
my $numOfUrls = $#urls;
my $start = time();
my $printed = 0; #have we printed the completion msg.
my $gId = 0;

my $spawned;
my @ctx = ();
my $i;

for ($i=0; $i<=$#ARGV; $i++) {
    if ($ARGV[$i] =~ /^expect/) {
        $expectedRespCode = $ARGV[$i+1];
        $i++;
    } elsif ($ARGV[$i] =~ /^rate/) {
        $rate = $ARGV[$i+1];
        print "rate is now $rate\n";
        $i++;
    } elsif ($ARGV[$i] =~ /^skip/) {
        $gId = $ARGV[$i+1];
        $i++;
    } else {
        die "only max, stayup are supported\n";
    } 
}
my $spawned = 0;
my $w = AnyEvent->condvar;
$| = 1;
my $start = getTS();
my $_timer;
$_timer = AnyEvent->timer(after => 0, interval => 0.001, cb => ::timeoutHandler);
$w->recv;

sub kickoff {
    my $id = $gId ++;
    if ($id > $numOfUrls) { 
        if ($printed == 0) {
            print "done!!\n"; $printed = 1;
        }
        return;
    }
    #print "$id\n";
    http_get $urls[$id], headers => { }, sub { 
        my $statusCode = $_[1]->{Status};
        #printf "status $statusCode %d\n", time() - $start;
        if (($id % 100) == 0) {
            print "$id\n";
        }
        if ($statusCode != $expectedRespCode) {
            print "unexpected resp code $id:$statusCode $urls[$id]\n";
        }
    };
}

sub timeoutHandler {
    #print time(), "|\n";
    if (! defined $start) {
        $start = getTS(); kickoff(); $spawned = 1; return;
    }
    my $delta = getTS() - $start;
    my $target = $delta * $rate;
    #printf "%.4f %4d $spawned\n", $delta, $target;
    for (; $spawned <= $target; $spawned++) {
        kickoff();
    }
    if ($delta >= 1.0 ) {
        $start += 1.0; $spawned = 0;
    }
}

sub readUrls {
    my $fname = shift;
    my $line;
    open FD, $fname || die "Failed to open $fname $!\n";
    while (<FD>) {
        chomp($line = $_);
        push @urls, $line;
    }
    close FD;
}

sub getTS {
    my ($seconds, $microseconds) = gettimeofday;
    return $seconds + (0.0+ $microseconds)/1000000.0;
}
pktCoder
  • 1,105
  • 2
  • 15
  • 32
  • I don't see the point of using a timer here. You may as well just call `timeoutHandler` in `for` loop. – Borodin Apr 20 '17 at 19:02
  • 1
    Since the script is single-threaded and thus uses only a single of the many CPU cores you have it might be useful to run multiple instances in parallel. – Steffen Ullrich Apr 20 '17 at 19:09
  • And you may as well try with straight forks, as in this [related problem](http://stackoverflow.com/questions/43024722/perl-too-slow-concurrent-download-with-both-httpasync-netasynchttp/). In the end I tried with well over 100 forks and saw no hints of OS strain or problems. – zdim Apr 20 '17 at 19:13
  • In `getTS()`, I think you're losing precision on the time... I don't think a double-precision float can store enough digits for the full timestamp to the microsecond. Of course, that may not cause the problem you're seeing. – TheAmigo Apr 20 '17 at 19:30
  • Have you tried profiling your code to see where it's spending time? `perl -d:NYTProf httpStresser.pl urlList rate 200` – TheAmigo Apr 20 '17 at 19:32
  • This script is expected to be able to do thousands HTTP transactions/second even if the latency with server is high, that's why I used AnyEvent. I remember I was able to get to 3000/second, not sure what wrong in this case. Thx. – pktCoder Apr 20 '17 at 20:53
  • 1
    @pktCoder: Using `AnyEvent` doesn't help to perform a high transaction rate. It doesn't offer parallel processing of any sort. A simple loop will issue HTTP requests as fast as possible, and all an `AnyEvent` timer can do is slow that down to a more accurately defined interval. – Borodin Apr 21 '17 at 09:52
  • Thanks @borodin for the comment. My understanding is that AnyEvent can generate requests fast regardless of how slow the server is, it will just have lots of outstanding requests. What I am confused is, why the client didn't even generate that many requests. – pktCoder Apr 21 '17 at 15:39
  • 1
    @pktCoder: Let me try to explain. `AnyEvent::HTTP` allows you to make asynchronous HTTP requests and specify a callback subroutine that will be called when the response comes back. It happens to use `AnyEvent` to achieve that, but it really doesn't matter how it does it. Pretty much any HTTP client library, like `LWP::UserAgent` or `Mojo::UserAgent` will do the same thing. Meanwhile you've chosen to use an `AnyEvent` timer, which is completely independent of the HTTP transactions, to call `timeoutHandler` 1,000 times a second. – Borodin Apr 21 '17 at 16:32
  • There's no need for that at all, and you can probably get away with just `timeoutHandler() while 1` although you will probably want to `usleep` to pass the time after all the HTTP requests you want have been sent each second. By the way, `Time::HiRes` works with pairs of integers, and you should avoid converting that to a floating point value. The module provides a `tv_interval` utility function which takes two such integer pairs and returns the difference in floating point seconds. That is all you need. – Borodin Apr 21 '17 at 16:41

0 Answers0