PHP curl - 413 Payload too large on retrieving certain web pages

Question

This code works fine for the majority of webpages, but I have started receiving the follow output error for certain web pages.

Unexpected HTTP code: 413 payload too large

I have included a webpage that produces the error in the code variable below.

I have increased the following PHP variables to their max with no luck: post_max_size, upload_max_filesize. I have also tweaked the apache variable LimitRequestBody to the max value also.

Any suggestions would be greatly appreciated.

function get_data($url) 
{
    $ch = curl_init();
    
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 Firefox/19.0");
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);

    $data = curl_exec($ch);
    
    if (!curl_errno($ch)) 
    {
        switch ($http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE)) 
        {
            case 200:  # OK
            break;
            default:
                echo 'Unexpected HTTP code: ', $http_code, "\n";
        }
    }
    curl_close($ch);
    return $data;
}

$url = "https://www.oddschecker.com/golf/open-championship/2021-open-championship/winner";
$returned_content = get_data($url);
echo "<Br>".$returned_content;

You get this on FOREIGN websites? Changing the mentioned values on YOUR end does not affect the foreign site... — Honk der Hase, Dec 24 '20 at 17:12
90% of pages on that same domain work for me. Just some of the slightly larger pages produce the error (like the URL included in the code). — Gareth, Dec 24 '20 at 17:16
The code provided works perfectly for me. Having written websites that people sometimes want to scrape, I sometimes return fun status codes and error messages to mess with the scrapers that don't have a license to my content. This might be the case for you. — Chris Haas, Dec 24 '20 at 18:09
Thanks for the input Chris. How do you identify scrapers? For example, I have no issues loading the site via my google chrome web browser with the same IP as the web-server. Also, I have no issues with scraping 99% of pages on the site — Gareth, Dec 24 '20 at 23:04

PHP curl - 413 Payload too large on retrieving certain web pages

0 Answers0