-2

I am trying to implement multithread functionality perl script for more speed I am trying to implement multithread functionality perl script for more speed

I need to know how to implement multithreading for the following perl code

#!/usr/bin/perl

use if $^O eq "MSWin32", Win32::Console::ANSI;
use Getopt::Long;
use HTTP::Request;
use LWP::UserAgent;
use IO::Select;
use HTTP::Headers;
use IO::Socket;
use HTTP::Response;
use Term::ANSIColor;
use HTTP::Request::Common qw(POST);
use HTTP::Request::Common qw(GET);
use URI::URL;
use IO::Socket::INET;
use Data::Dumper;
use LWP::Simple;
use LWP;
use URI;
use JSON qw( decode_json encode_json );
use threads;

my $ua = LWP::UserAgent->new;
$ua    = LWP::UserAgent->new(keep_alive => 1);
$ua->agent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31");


{
    chomp($site);
    push(@threads, threads->create (\&ask, \&baidu, $site));
    sleep(1) while(scalar threads->list(threads::running) >= 50);
}

eval {
    $_->join foreach @threads;
    @threads = ();
};

########### ASK ###########

sub ask {
    for ( $i = 0; $i < 20; $i += 1) {

        my $url = "https://www.ask.com/web?o=0&l=dir&qo=pagination&q=site%3A*.fb.com+-www.fb.com&qsrc=998&page=$i";

        my $request  = $ua->get($url);
        my $response = $request->content;

        while( $response =~ m/((https?):\/\/([^"\>]*))/g ) {

            my $link = $1;

            my $site = URI->new($link)->host;
            if ( $site =~ /$s/ ) {
                if ( $site !~ /</ ) {   
                    print "ask: $site\n";
                }
            } 
        }
    } 
}


########### Baidu ###########

sub baidu {
    for ( my $ii = 10; $ii <= 760; $ii += 10 ) {

        my $url = "https://www.baidu.com/s?pn=$ii&wd=site:fb.com&oq=site:fb.com";


        my $request  = $ua->get($url);
        my $response = $request->content;

        while ( $response =~ m/(style="text-decoration:none;">([^\/]*))/g ) {
            my $site = $1;
            $site =~ s/style="text-decoration:none;">//g;
            if ( $site =~ /$s/ ) {
                print "baidu: $site\n";
            }
        }
    }
}

If run this code I get only Result from Ask.com. How I can fix this problem and thanks for all ?

C:\Users\USER\Desktop>k.pl -d fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: techprep.fb.com
ask: newsroom.fb.com
ask: rightsmanager.fb.com    ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: techprep.fb.com
ask: newsroom.fb.com
ask: rightsmanager.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: techprep.fb.com
ask: newsroom.fb.com
ask: rightsmanager.fb.com

1 Answers1

6

OK, so first off - there's some really quite icky looking things you're doing here, and I'd suggest you need to step back and review your code. It's looking a bit 'cargo-cult' thanks to things like:

use HTTP::Request::Common qw(POST);
use HTTP::Request::Common qw(GET);

Or:

my $ua = LWP::UserAgent->new;
$ua    = LWP::UserAgent->new(keep_alive => 1);

... you're creating a new LWP::UserAgent instance, and then ... creating another one with a different parameter.

You've also got a load of errors that you're not seeing because you didn't include the most important use items:

use strict;
use warnings qw ( all );

Turn these on first, and then fix the errors.

But here for example:

push(@threads, threads->create (\&ask, \&baidu, $site));

What do you think this line is supposed to do? Because what's actually happening here is you try and invoke the ask sub, and then pass it arguments of a code reference to baidu sub, and a string $site - which is undefined at this point in the code. But that's academic, because you NEVER READ THEM in your subroutine.

So it's not really a surprise your code isn't really working - it's nonsense.

But that aside - perls threading model is often misunderstood. It is not a lightweight thread like you might be thinking of in other programming languages - actually it's rather heavyweight.

You're creating and spawning a thread per iteration, and that's not very efficient either.

What you really want to be doing is using Thread::Queue.

Spawn a small number of 'worker' threads per task, have them read from the queue, and do their work individually.

end the queue when it's done with, and let the threads exit and be reaped by the main process.

Something like in this answer: Perl daemonize with child daemons

... but are you sure there isn't a module that does what you want anyway?

Sobrique
  • 52,974
  • 7
  • 60
  • 101