0

I'm trying to fetch 10 webpages simultaneously.

I'm using curl_multi.

However i end up with a lot of 503 (too many requests) error on most of the fetched webpages. How can i fix this?

Here's the php script that i ran. http://pastebin.com/HhrffciC

You can run it on any php enabled server.

Here is what the output on my machine looked like. https://i.stack.imgur.com/twgPZ.jpg

tacoder
  • 15
  • 3
  • It looks like the server you're trying to contact doesn't want you sending it many requests at once (which is smart actually, to prevent brute force attacks). Try making the requests one at a time? – wavemode Sep 28 '14 at 19:55
  • Appreciate your reply. Yes it seems like that. The problem is i want to fetch 10 web pages simultaneously. If i do that one by one it takes 15-20 seconds which is a lot of time since i need to run this script every minute or so. I was wondering if there was a way i could configure the curl_multi function to fire off requests at a delay so that the server doesn't think i'm trying to brute force it. Any other method of accomplishing the same would also be appreciated. – tacoder Sep 28 '14 at 20:12
  • Since i'm gonna return json data only so i was thinking just in case php is gonna be a pain in the brain, just maybe i could use a different language altogether and do the processing in it. Any pointers to that would also be appreciated. – tacoder Sep 28 '14 at 20:13

1 Answers1

1

There is a library called ParallelCurl that can allow you to control how many simultaneous requests are sent out. The script below sets the maximum to 5 and simply sends a series of GET requests to the URLs in your code. If this displays 503 errors for you (it doesn't for me) you can lower $max_requests to your needs.

<?php

require __DIR__ . '/parallelcurl.php';

function on_request_done($content, $url, $ch, $search) {
    echo $content;
}

$data = array(
    'http://www.codechef.com/status/CLETAB,tacoder',
    'http://www.codechef.com/status/CRAWA,tacoder',
    'http://www.codechef.com/status/EQUAKE,tacoder',
    'http://www.codechef.com/status/MOU2H,tacoder',
    'http://www.codechef.com/status/PRGIFT,tacoder',
    'http://www.codechef.com/status/PUSHFLOW,tacoder',
    'http://www.codechef.com/status/REVERSE,tacoder',
    'http://www.codechef.com/status/SEASHUF,tacoder',
    'http://www.codechef.com/status/SIGFIB,tacoder',
    'http://www.codechef.com/status/TSHIRTS,tacoder'
);

$max_requests = 5;
$parallel_curl = new ParallelCurl($max_requests);

foreach ($data as $url) {
    $parallel_curl->startRequest($url, 'on_request_done');
}

$parallel_curl->finishAllRequests();

The GitHub README explains how to use the library further.

wavemode
  • 2,076
  • 1
  • 19
  • 24
  • 1
    Can't upvote because not enough reputation. Just the perfect answer i was looking for. Thank you very much!! – tacoder Sep 28 '14 at 21:18
  • I tried to set $max_requests = 1 and then $max_requests = 100 - in the both experiments time to request-response was are same. How could this happend ? – Сергей Sep 24 '18 at 15:39
  • @Сергей PHP 5.5 broke ParallelCurl. Use [rollingcurlx](https://github.com/marcushat/rollingcurlx) or something else. – wavemode Sep 25 '18 at 01:29
  • @wavemode I need to send more than 5000 requests per minute, and get the responses. is this library can do it? – Сергей Sep 25 '18 at 06:37