0

I've been stuck with this for quite a while now. I want to parse a PDF to Text using Tika hosted on an external server dedicated for this. It should work with any remote pdf url and any Tika server (currently using this free test some amazing guy set up).

Anyways, this command works perfectly on command line, but have not been able to translate it to PHP, I want to be able to get the resulting text and save it to db and would rather not have to use exec().

curl "https://rifed-alfgago.c9users.io/wp-content/uploads/2017/06/demopdf.pdf" | curl -X PUT -T http://beta.offenedaten.de:9998/tika

This is what I have so far in PHP, but its not working and can't find the reason why:

$fileurl = "https://rifed-alfgago.c9users.io/wp-content/uploads/2017/06/demopdf.pdf";
    $file = fopen($fileurl, 'r');
    $url = "http://beta.offenedaten.de:9998/tika";

    $ch = curl_init();

    $options = array(
        CURLOPT_URL            => $url,
        CURLOPT_CUSTOMREQUEST  => "PUT",
        CURLOPT_RETURNTRANSFER => 1,
        CURLOPT_HEADER         => 1,
        CURLOPT_CONNECTTIMEOUT => 120,
        CURLOPT_TIMEOUT        => 120,
        CURLOPT_MAXREDIRS      => 10,
        CURLOPT_INFILE         => $file 
    );
    curl_setopt_array( $ch, $options );

    $response = curl_exec($ch); 
    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    echo "<pre>".htmlspecialchars($response)."</pre>";

    curl_close ($ch);

Thank you in advanced

Alfredo Gago
  • 81
  • 1
  • 7
  • 1
    You are reading the file and try to put using php curl. but what you need to do is upload your file to that url using curl. Try with uploading your local file to the tika first and see if works or not – BetaDev Jun 28 '17 at 22:43
  • using your code, I am getting server side error not your side. your pdf file does not opens directly so i tried with mine `http://shaileshsingh.com.np/test.pdf` and i got `HTTP/1.1 415 Unsupported Media Type Content-Length: 0 Server: Jetty(8.y.z-SNAPSHOT)` – BetaDev Jun 28 '17 at 23:36

0 Answers0