0

I have an html form that allows users to upload a file, which then uses IBM Watson's document conversion API to convert the text of the document into normalized text which is then inserted into a database.

Upon testing, I have received the following error multiple times:

{ "code" : 415, "error" : "The Media Type [text/plain] of the input document is not supported. Auto correction was attempted, but the auto detected media type [text/plain] is also not supported. Supported Media Types are: application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/pdf, text/html, application/xhtml+xml ." }

Here is my form (testform.html):

    <form action="testform.php" method="post" enctype="multipart/formdata">
     <input type="file" name="newdoc" id="newdoc"> Upload New Doc:
     </input>
     <button type="submit" name="submit">Submit</button>
    </form>

And here is my php script (testform.php):

    <?php 
    $filename = $_FILES['newdoc']['name'];
    $filetype = $_FILES['newdoc']['type'];
    $filesize = $_FILES['newdoc']['size'];
    $filetmp  = $_FILES['newdoc']['tmp_name'];

    // Watson Document Conversion
    $dcuser = 'arbitrary_user';
    $dcpass = 'arbitrary_pwd';
    $userpwd = $dcuser . ":" . $dcpass;

    // Initialize cURL
    $documentconversion = curl_init();

    // Set POST 
    curl_setopt($documentconversion, CURLOPT_POST, true);

    // Set DC API URL
    curl_setopt($documentconversion, CURLOPT_URL, 
    'https://gateway.watsonplatform.net/document-
    conversion/api/v1/convert_document?version=2015-12-15');

    // Set Username:Password
    curl_setopt($documentconversion, CURLOPT_USERPWD, $userpwd);

    // Set conversion units, file, and file type
    curl_setopt($documentconversion, CURLOPT_POSTFIELDS, array(
     'config' => "{\"conversion_target\":\"normalized_text\"}",
     'file'   => '@' . realpath($filetmp) . ';type=' . $filetype
    ));

    // Set return value
    curl_setopt($documentconversion, CURLOPT_RETURNTRANSFER, true);

    // Execute and get response
    $response = curl_exec($documentconversion);

    // Close cURL
    curl_close($documentconversion);
    ?>

Normally the $response variable would contain the converted text but I've been getting nothing but the mentioned above 415 errors even though I'm uploading only PDFs.

Any thoughts as to why it's not working?

Daniel La
  • 1
  • 1

1 Answers1

0

From the error it seems that your PHP script is passing a text/plain filetype, which is not supported by the service. Instead, try passing in application/pdf as the filetype.

You can also try running the request with a simple curl command:

curl -X POST -u "YOUR_USERNAME":"YOUR_PASSWORD" -F config="{\"conversion_target\":\"normalized_text\"}" -F "file=@sample.pdf;type=application/pdf" "https://gateway.watsonplatform.net/document-conversion/api/v1/convert_document?version=2015-12-15"

As you can find in the API reference, the supported types are: text/html, text/xhtml+xml, application/pdf, application/msword, and application/vnd.openxmlformats-officedocument.wordprocessingml.document.

  • I just tried that as well and I'm still getting the 415 error =\ – Daniel La Jun 15 '17 at 16:31
  • Could you try it with a curl command: curl -X POST -u "{username}":"{password}" -F config="{\"conversion_target\":\"normalized_text\"}" -F "file=@sample.pdf;type=application/pdf" "https://gateway.watsonplatform.net/document-conversion/api/v1/convert_document?version=2015-12-15" – Anton Prevosti Jun 15 '17 at 17:13
  • Getting a 401 error: { "code" : 401 , "error" : "Not Authorized" , "description" : "2017-06-15T16:17:17-04:00, Error ERCDPLTFRM-INVLDCHR occurred when accessing https://gateway.watsonplatform.net/document-conversion/api/v1/convert_document?version=2015-12-15, Tran-Id: gateway-dp02-1290491232 - " } – Daniel La Jun 15 '17 at 20:17
  • The curl command got mangled in the comment. I added the curl command in my response body. Please try that one instead. – Anton Prevosti Jun 16 '17 at 15:03