0

So I've been having a difficult time with my foray into event driven programming. Most of it due to still thinking sequentially but I'm having a hard time understanding how anyone synchronizes their code when using LWP::Protocol::AnyEvent::http and am looking for some help in understanding. Here is the smallest program I can create that demonstrates my basic lack of understanding:

use strict;
use warnings;

use 5.10.0;

use LWP::Protocol::AnyEvent::http;
use WWW::Mechanize;
use Coro qw(async);

my $url = "http://feedproxy.google.com/~r/PerlNews/~3/kqUb_rpU5dE/";

my $mech = WWW::Mechanize->new;
$mech->get($url);

my @cs;
foreach my $link ($mech->links) {
  my $c = async {
    say "Getting " . $link->url;
    my $ua = WWW::Mechanize->new;
    $ua->get($link->url);
  };
  push(@cs, $c);
} 

$_->join for (@cs);

How do I make sure the ->get has succeeded before going into the foreach loop? the ->get will return immediately since it doesn't block when using the LWP::Protocol::AnyEvent::http module. So there are no ->links and the program just exits. Removing LWP::Protocol::AnyEvent::http obviously makes the program return links, like a regular sequential program, and slow like one too.

Thanks for any insight.

ikegami
  • 367,544
  • 15
  • 269
  • 518
mikew
  • 912
  • 2
  • 11
  • 22
  • 1
    Note that's it's very impolite to hammer web sites in this fashion (requesting all links on a page simultaneously). This is the kind of thing that gets you banned from using web sites. – ikegami Oct 19 '13 at 16:37
  • Its eventually going to be just fetching images and from what I can tell will do basically what a browser does by fetching multiple images at once. I agree though, I should and will switch to a simple test on a site I control. – mikew Oct 20 '13 at 01:01

1 Answers1

4

the ->get will return immediately since it doesn't block when using the LWP::Protocol::AnyEvent::http module.

That's not true. It blocks as normal. LWP::Protocol::AnyEvent::http should not affect how WWW::Mechanize works at all. It merely allows other Coro threads and AnyEvent callbacks to execute while WWW::Mechanize is blocked.

How do I make sure the ->get has succeeded before going into the foreach loop?

Your code already waits for it to complete. (In fact, I'm tempted to add your code to the documentation!)

If you want to check if it succeeded, you could use

die "Error" if !$mech->success;
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Thanks for the response and clarification. I wouldn't use the code for documentation since it's not working :) Why does the code not work when LWP::Protocol::AnyEvent::http is in use? No links are fetched. I drew wrong conclusions based on the behavior I observed. I have not become familiar enough with LWP internals to understand the module at the code level yet. – mikew Oct 20 '13 at 01:13
  • The only reasons the links don't get fetched is because the program dies trying to fetch `#respond`. (Like the message says, "Error GETing #respond: URL must be absolute at a.pl line 21.") It does the same thing without LWP::P::AE::http too. – ikegami Oct 20 '13 at 05:52
  • Bah, this is what I get for programming late at night. facepalms. I wasn't getting an error, just the program exiting but when switching to fetch images and just hitting www.google.com I see it working as you have explained. Thanks again for the clarifications. Much appreciation for the module and the support! – mikew Oct 20 '13 at 17:00