11

Is it possible to make the Guzzle pool wait for requests?

Right now I can add requests to the pool dynamically, but as soon as the pool is empty, guzzle will stop (obviously).

This is a problem when I'm doing 10 or so pages concurrently, because my requests array will be empty until the resulting HTML pages have been processed and new links added.

This is my generator:

$generator = function () {
  while ($request = array_shift($this->requests)) {
    if (isset($request['page'])) {
      $key = 'page_' . $request['page'];
    } else {
      $key = 'listing_' . $request['listing'];
    }

    yield $key => new Request('GET', $request['url']);                                          
  }
  echo "Exiting...\n";
  flush();
};

And my pool:

$pool = new Pool($this->client, $generator(), [
  'concurrency' => function() {
    return max(1, min(count($this->requests), 2));
  },
  'fulfilled' => function ($response, $index) {
      // new requests may be added to the $this->requests array here
  }
  //...
]);

$promise = $pool->promise();
$promise->wait();

Edited code after answer by @Alexey Shockov:

$generator = function() use ($headers) {
  while ($request = array_shift($this->requests)) {
    echo 'Requesting ' . $request['id'] . ': ' . $request['url'] . "\r\n";

    $r = new Request('GET', $request['url'], $headers);

    yield 'id_' . $request['id'] => $this->client->sendAsync($r)->then(function($response, $index) {
      echo 'In promise fulfillment ' . $index . "\r\n";
    }, function($reason, $index) {
      echo 'in rejected: ' . $index . "\r\n";
    });
  }
};

$promise = \GuzzleHttp\Promise\each_limit($generator(), 10, function() {
  echo 'fullfilled' . "\r\n";
  flush();
}, function($err) {
  echo 'rejected' . "\r\n";
  echo $err->getMessage();
  flush();
});
$promise->wait();

5 Answers5

7

Unfortunately, you cannot do that with a generator, only with a custom iterator.

I prepared a gist with the full example, but the main idea is just to create an Iterator that will change its state in both ways (it can became valid again after the end).

An example with ArrayIterator in psysh:

>>> $a = new ArrayIterator([1, 2])
=> ArrayIterator {#186
     +0: 1,
     +1: 2,
   }
>>> $a->current()
=> 1
>>> $a->next()
=> null
>>> $a->current()
=> 2
>>> $a->next()
=> null
>>> $a->valid()
=> false
>>> $a[] = 2
=> 2
>>> $a->valid()
=> true
>>> $a->current()
=> 2

With this idea in mind we can pass such dynamic iterator to Guzzle and leave it do the work:

// MapIterator mainly needed for readability.
$generator = new MapIterator(
    // Initial data. This object will be always passed as the second parameter to the callback below
    new \ArrayIterator(['http://google.com']),
    function ($request, $array) use ($httpClient, $next) {
        return $httpClient->requestAsync('GET', $request)
            ->then(function (Response $response) use ($request, $array, $next) {
                // The status code for example.
                echo $request . ': ' . $response->getStatusCode() . PHP_EOL;
                // New requests.
                $array->append($next->shift());
                $array->append($next->shift());
            });
    }
);
// The "magic".
$generator = new ExpectingIterator($generator);
// And the concurrent runner.
$promise = \GuzzleHttp\Promise\each_limit($generator, 5);
$promise->wait();

As I said before, the full example is in the gist, with MapIterator and ExpectingIterator.

Alexey Shokov
  • 4,775
  • 1
  • 21
  • 22
  • ok this works thanks and also for the link to psysh (didn't know about that). Please be sure to not delete your gist as it may be valuable to others in the future! –  Apr 20 '17 at 17:05
  • That was the goal ;) I will try to add more comments, because it's not clear from the first view, why `ExpectingIterator` is required. – Alexey Shokov Apr 20 '17 at 18:37
  • @ncla, it does not. `Pool` class uses the same mechanism inside as `each_limit()`. I created a separated package (based on the code above) to simplify things as much as I can, so please take a look: https://github.com/alexeyshockov/guzzle-dynamic-pool/blob/master/example/app1.php – Alexey Shokov Feb 05 '19 at 12:15
1

It seems from the question, that you are able to move aggregation callback directly to the query. In this case the pool will always wait for your processing code, so you will be able to add new requests at any point.

A generator can return either a request or a promise, and promises can be combined together in different ways.

$generator = function () {
    while ($request = array_shift($this->requests)) {
        if (isset($request['page'])) {
            $key = 'page_' . $request['page'];
        } else {
            $key = 'listing_' . $request['listing'];
        }

        yield $this->client->sendAsync('GET', $request['url'])
            ->then(function (Response $response) use ($key) {
            /*
             * The fullfillment callback is now connected to the query, so the 
             * pool will wait for it.
             * 
             * $key is also available, because it's just a closure, so 
             * no $index needed as an argument.
             */
        });
    }
    echo "Exiting...\n";
    flush();
};

$promise = \GuzzleHttp\Promise\each_limit($generator(), [
    'concurrency' => function () {
        return max(1, min(count($this->requests), 2));
    },
    //...
]);

$promise->wait();
Alexey Shokov
  • 4,775
  • 1
  • 21
  • 22
  • Hi, this doesn't appear to be working. I changed it to $this->client->requestAsync but get this error: Fatal error: Uncaught InvalidArgumentException: Each value yielded by the iterator must be a Psr7\Http\Message\RequestInterface or a callable that returns a promise that fulfills with a Psr7\Message\Http\ResponseInterface object –  Apr 20 '17 at 07:58
  • Also tried making a new Request, then using the sendAsync method that you proposed (since the method requires a request, not as in your example), but it doesn't work either unfortunately. –  Apr 20 '17 at 08:01
  • Yeah, forgot the difference between `each_limit()` and `Pool`. Updated the answer. Basically it's more convenient to use `each_limit()` function instead of the pool, because it wraps promises inside. – Alexey Shokov Apr 20 '17 at 09:02
  • Please see my edited code, had to edit the params to the each_limit function a little. But now I get the error: Too few arguments to function Scraper::{closure}(), 1 passed in /vendor/guzzlehttp/promises/src/Promise.php on line 203 and exactly 2 expectedRequesting - any ideas ? –  Apr 20 '17 at 10:20
  • ok my mistake, so if i add only 1 argument to my fulfilled and rejected callbacks it works. HOWEVER, it is very important that I know the request $index.. Is it not possible to get that info somehow? This was possible with the Pool object. Without it, each_limit is useless to me. –  Apr 20 '17 at 10:58
  • You define the closure, so you are able to use all the variables from the context. Just do `->then(function (...) use ($key) { ... }` and use the key inside your closure. There is no need to get it as an argument. – Alexey Shokov Apr 20 '17 at 11:57
  • OK, that works. However, now I tested the basic premise, namely that new requests can be added at any time, and that doesn't seem to work. I added a URL that takes 55 seconds to complete (it simply has a php sleep(55); statement and some text afterwards). In the success callback, I want to add another request when this one has been processed. I add it to the $this->requests array, but nothing happens after that. No new request is being made. –  Apr 20 '17 at 12:07
  • Agree, your case is a bit more complex than I expected. See the new answer from me for the right solution. – Alexey Shokov Apr 20 '17 at 16:31
1

As I said before, the full example is in the gist, with MapIterator and ExpectingIterator

Iterators dont become valid again on php < 7, your example with arrayIterator and sample with MapIterator all stop after initial pool is exhausted...

On other hand it all works on earlier versions of php provided you use ->append method on iterator instead of [] push.

Wrongusername
  • 61
  • 1
  • 3
0

The answer is yes you can. You just need more generators. And to separate your request parse and queuing logic into asynchronous design. Rather than using an array for the requests your pool is going to issue and wait for it needs to be itself a generator that yields new requests from your initial list and requests added from parsed responses until all requests are sent parsed and resulting request are sent and parsed (recurring) or a stop condition is encountered.

Jeremy Giberson
  • 1,063
  • 8
  • 15
  • I was under the assumption that is exactly what I have. Anything that I add to the $this->requests array, will automatically be used by the $generator. –  Apr 18 '17 at 06:45
  • And what happens when your array is empty? As a thought exercise trace through your code when there is only 1 request in the array. When the request is removed from the array is there any reason for your generator to sit around waiting for anything to be added back to it? – Jeremy Giberson Apr 18 '17 at 15:53
  • But guzzle Pool class only accepts 1 $generator, so not sure what you're talking about when you say that the pool itself needs to be a generator? –  Apr 18 '17 at 19:29
0

If you can use postAsync/getAsync or so, you can use the below skeleton,

function postInBulk($inputs)
{
    $client = new Client([
        'base_uri' => 'https://a.b.com'
    ]);
    $headers = [
        'Authorization' => 'Bearer token_from_directus_user'
    ];

    $requests = function ($a) use ($client, $headers) {
        for ($i = 0; $i < count($a); $i++) {
            yield function() use ($client, $headers) {
                return $client->postAsync('https://a.com/project/items/collection', [
                    'headers' => $headers,
                    'json' => [
                        "snippet" => "snippet",
                        "rank" => "1",
                        "status" => "published"
                    ]        
                ]);
            };
        }
        
    };

    $pool = new Pool($client, $requests($inputs),[
        'concurrency' => 5,
        'fulfilled' => function (Response $response, $index) {
            // this is delivered each successful response
        },
        'rejected' => function (RequestException $reason, $index) {
            // this is delivered each failed request
        },
    ]);

    $pool->promise()->wait();
}
Deepan Prabhu Babu
  • 862
  • 11
  • 18