2

I'm struggling to get a PDF file generated by an external server.

Here is the link to the resource : https://www.test.colisprive.com/mcadesk/Externe/ShowEtiquettePDF.aspx/etiquette_colis-23-23000000000833300-PDF_DEFAUT-N/

So as you can see, no identification needed.

I noticed that I can write anything I want at the end of the URL and it will be interpreted as a title by the browser integrated pdf reader. But when using "save as..." the name of the file is already set to a fixed value.

I tried to get it with cURL but it returns "Object moved to here."(link), Except the link doesn't work and using a CURLOPT_FOLLOWLOCATION returns false.

I really need to get to download pdf files from this URL but I'm completely stuck, any idea would be very welcome !!

Thanks, BR,

Manu

edit : I tried this :

$curl = curl_init();
    curl_setopt_array($curl, array(
        CURLOPT_HTTPHEADER => array(
        ),
        CURLOPT_URL=>"https://www.test.colisprive.com/mcadesk/Externe/ShowEtiquettePDF.aspx/etiquette_colis-23-23000000000833300-PDF_DEFAUT-N/Etiquette_23000000000833300.pdf",
        CURLOPT_RETURNTRANSFER => 1,
    ));

    $resp = curl_exec($curl);
    var_dump($resp);

    curl_close($curl);
thelr
  • 1,134
  • 11
  • 30
  • *I tried to get it with cURL* - can you include the code here so others can perhaps see where the problem lies? – Nigel Ren Jun 02 '20 at 11:19
  • Perhaps this question can help: https://stackoverflow.com/questions/4752389/php-readfile-of-ext… – ivion Jun 02 '20 at 11:42
  • 1
    @ivion I tried with file_get_contents too but it doesn't work ! – BigIndian66 Jun 02 '20 at 12:01
  • It's possible that the remote server is checking some request data (headers, maybe useragent) to restrict the file to only be accessible by requests that appear to come from browsers. You might try fetching the file with other non-browser technologies entirely, and see if the results are any different. – thelr Jun 02 '20 at 12:07
  • @thelr "other non-browser technologies" ? What do you have in mind ? – BigIndian66 Jun 02 '20 at 12:08
  • @BigIndian66: try the equivalent of file_get_contents from another programming language entirely; a testing tool like Postman, etc – thelr Jun 02 '20 at 12:20
  • OK, I'll try this, thanks! – BigIndian66 Jun 02 '20 at 13:29
  • 1
    @thelr it works easily with python for instance ! `import urllib.request urllib.request.urlretrieve("https://www.test.colisprive.com/mcadesk/Externe/ShowEtiquettePDF.aspx/etiquette_colis-23-23000000000833300-PDF_DEFAUT-N/Etiquette_23000000000833300.pdf", "test.pdf")` but still can't manage to make in work with PHP... – BigIndian66 Jun 03 '20 at 05:47

1 Answers1

2

The mentioned website does not serve the requested content (and issue a redirect instead) if the request does not provide a User-Agent header.

PHP's CURL does not set a User-Agent by default, nor file_get_contents. Differently, command line curl and Python's urllib.request.urlretrieve do, that's why you succeeded with the latter.

With PHP's CURL you have to set the User-Agent by your own but it's just one line.

Note that the website you're accessing requires it, but accepts any User-Agent.

$curl = curl_init();
curl_setopt_array($curl, array(
    CURLOPT_HTTPHEADER => array(
        "User-Agent: curl" // <--- the User Agent is specified by setting the corresponding header
    ),
    CURLOPT_URL=>"https://www.test.colisprive.com/mcadesk/Externe/ShowEtiquettePDF.aspx/etiquette_colis-23-23000000000833300-PDF_DEFAUT-N/Etiquette_23000000000833300.pdf",
    CURLOPT_RETURNTRANSFER => 1
));

$resp = curl_exec($curl);

var_dump($resp);

curl_close($curl);

 

The output you get looks like:

%PDF-1.4
1 0 obj
<< 
/Length 1514
/Filter /FlateDecode
.
.
.

 

you're actually receiving a PDF.


You may then serve the fetched PDF

echo $resp;

or store the file on your server

file_put_contents( "/path/to/file", $resp );

Paolo
  • 15,233
  • 27
  • 70
  • 91
  • Argh ! I did mess with the headers but I missed this one ! Stupid me... Thank you sooo much !!! :) (Out of curiosity, and for personal knowledge : can you tell me how did you notice the "User-Agent" was required ?) – BigIndian66 Jun 04 '20 at 08:01
  • 1
    @BigIndian66 sure: I noticed command line `curl` was able to download the file while PHP's `curl` was not. I ran both in **verbose** mode and spotted the difference. – Paolo Jun 04 '20 at 16:46