0

I've been working on a script that makes close to a thousand async requests using getAsync and Promise\Settle. Each page requested it then parsed using Symphony crawler filter method (Also slow but a separate issue.)

My code looks something like this:

$requestArray = [];
$request = new Client($url);

foreach ($thousandItemArray as $item) {
    $requestArray[] = $request->getAsync(null, $query);
}

$results = Promise\settle($request)->wait(true);
foreach ($results as $item) {
    $item->crawl();
}

Is there a way I can crawl the requested pages as they come in rather than waiting for them all and then crawling. Am i right in thinking this would speed things up if possible?

Thanks for your help in advance.

Ben Lewis Watson
  • 168
  • 2
  • 11
  • 1
    Sure! Take a look at callbacks! http://docs.guzzlephp.org/en/5.3/clients.html#asynchronous-response-handling You can essentially specify what to do as soon as the results of each of the requests are ready. (Could be a closure or a named function) – Marios Hadjimichael Jul 11 '17 at 01:25
  • Great! Just what I was looking for. Hope it's in Guzzle 6! Thank you! – Ben Lewis Watson Jul 11 '17 at 22:42

1 Answers1

2

You can. getAsync() returns a promise, so you can assign an action to it using ->then().

$promisesList[] = $request->getAsync(/* ... */)->then(
    function (Response $resp) {
        // Do whatever you want right after the response is available.
    }
);

$results = Promise\settle($request)->wait(true);

P.S.

Probably you want to limit the concurrency level to some number of requests (not to start all the requests at once). If yes, use each_limit() function instead of settle. And vote for my PR to be able to use settle_limit() ;)

Alexey Shokov
  • 4,775
  • 1
  • 21
  • 22