0

I have a web application running on a Laravel PHP server. For some needs (Word document processing), I implemented a Python server that does data extraction. I would like to know how to call my Python server from PHP by passing a file to it. Currently, I save the docx file on the PHP server, accessible via a url. I make an http POST request from the PHP server to the Python server with the URL to download the document. The problem is that I get a deadlock since the PHP server is waiting on the response from the Python server and the Python server is waiting on the PHP server to download the document. Any suggestions on how to get around this problem?

Here the PHP code:

// Send POST REQUEST
  $context_options = array(
    'http' => array(
      'method' => 'POST',
      'header' => "Content-type: application/x-www-form-urlencoded\r\n"
        . "Content-Length: " . strlen($data) . "\r\n",
      'content' => $data,
      'timeout' => 10,
    )
  );

  $context = stream_context_create($context_options);
  $result = fopen('http://localhost:5000/api/extraction','r', false, $context);

And here the Python code:

@app.route('/api/extraction', methods=['post'])
def extraction():
    data = request.form.to_dict()
    url = data['file']  # get url
    filename = secure_filename(url.rsplit('/', 1)[-1])
    path = os.path.join(app.config['UPLOAD_FILE_FOLDER'], filename) 
    urllib.request.urlretrieve(url, path)
Snakies
  • 3
  • 1
  • Why does the Python code need to retrieve the already-uploaded file from PHP (`urllib.request.urlretrieve(url, path)`)? The content of that file should have been in the POST request body, right? – Koala Yeung Dec 09 '20 at 08:21
  • I was unable to pass the word document into the body of the request and use it on the Python server side. That's why I use the URL to download it on the Python side. – Snakies Dec 09 '20 at 08:53
  • Is your PHP server unable to serve the file when it's waiting for the response from the Python server? If that isn't the case, you should be able to respond to the PHP server from the Python server as soon as it has retrieved the file. – PIG208 Dec 09 '20 at 09:57
  • Precisely the PHP server cannot serve the file because it is waiting for the response from the Python server. After the timeout of 10 seconds, it gives up the `fopen` command and at that moment the Python server can load the document but it can't answer the PHP server anymore. – Snakies Dec 09 '20 at 10:08

1 Answers1

0

You should send the file through proper POST (multipart/form) request instead of having Python fetching the data. It's much harder to debug and maintain than your current 2-roundtrip approach.

Approach 1: Normal Form Request

<?php

/**
 * A genertor that yields multipart form-data fragments (without the ending EOL).
 * Would encode all files with base64 to make the request binary-safe.
 *
 * @param iterable $vars
 *    Key-value iterable (e.g. assoc array) of string or integer.
 *    Keys represents the field name.
 * @param iterable $files
 *    Key-value iterable (e.g. assoc array) of file path string.
 *    Keys represents the field name of file upload.
 *
 * @return \Generator
 *    Generator of multipart form-data fragments (without the ending EOL) in array format,
 *    always contains 2 values:
 *      0 - An array of header for a key-value pair.
 *      1 - A value string (can contain binary content) of the key-value pair.
 */
function generate_multipart_data_parts(iterable $vars, iterable $files=[]): Generator {
    // handle normal variables
    foreach ($vars as $name => $value) {
        $name = urlencode($name);
        $value = urlencode($value);
        yield [
            // header
            ["Content-Disposition: form-data; name=\"{$name}\""],
            // value
            $value,
        ];
    }

    // handle file contents
    foreach ($files as $file_fieldname => $file_path) {
        $file_fieldname = urlencode($file_fieldname);
        $file_data = file_get_contents($file_path);
        yield [
            // header
            [
                "Content-Disposition: form-data; name=\"{$file_fieldname}\"; filename=\"".basename($file_path)."\"",
                "Content-Type: application/octet-stream", // for binary safety
            ],
            // value
            $file_data
        ];
    }
}

/**
 * Converts output of generate_multipart_data_parts() into form data.
 *
 * @param iterable $parts
 *    An iterator of form fragment arrays. See return data of
 *    generate_multipart_data_parts().
 * @param string|null $boundary
 *    An optional pre-generated boundary string to use for wrapping data.
 *    Please reference section 7.2 "The Multipart Content-Type" in RFC1341.
 *
 * @return array
 *    An array with 2 items:
 *    0 - string boundary
 *    1 - string (can container binary data) data
 */
function wrap_multipart_data(iterable $parts, ?string $boundary = null): array {
    if (empty($boundary)) {
        $boundary = '-----------------------------------------boundary' . time();
    }
    $data = '';
    foreach ($parts as $part) {
        list($header, $content) = $part;
        // Check content for boundary.
        // Note: Won't check header and expect the program makes sense there.
        if (strstr($content, "\r\n$boundary") !== false) {
            throw new \Exception('Error: data contains the multipart boundary');
        }
        $data .= "--{$boundary}\r\n";
        $data .= implode("\r\n", $header) . "\r\n\r\n" . $content . "\r\n";
    }
    // signal end of request (note the trailing "--")
    $data .= "--{$boundary}--\r\n";
    return [$boundary, $data];
}

// build data for a multipart/form-data request
list($boundary, $data) = wrap_multipart_data(generate_multipart_data_parts(
    // normal form variables
    [
        'hello' => 'world',
        'foo' => 'bar',
    ],
    // files
    [
        'upload_file' => 'path/to/your/file.xlsx',
    ]
));

// Send POST REQUEST
$context_options = array(
    'http' => array(
        'method' => 'POST',
        'header' => "Content-type: multipart/form-data; boundary={$boundary}\r\n"
            . "Content-Length: " . strlen($data) . "\r\n",
        'content' => $data,
        'timeout' => 10,
    )
);

$context = stream_context_create($context_options);
$result = fopen('http://localhost:5000/api/extraction','r', false, $context);

Your Python script should receive the file as a normal HTTP form file upload (with the file field named "upload_file"). Use your framework supported method to get the file from the request.

Approach 2: A really long x-www-form-urlencoded value

If you're concern about binary safety, or if it somehow failed, the other approach would be submitting the file as a base64 encoded string:

<?php

$file_data = file_get_contents('/some');
$data = urlencode([
  'upload_file' => base64_encode('path/to/your/file.xlsx'),
]);

// Send POST REQUEST
$context_options = array(
    'http' => array(
        'method' => 'POST',
        'header' => "Content-type: application/x-www-form-urlencoded\r\n"
            . "Content-Length: " . strlen($data) . "\r\n",
        'content' => $data,
        'timeout' => 10,
    )
);

$context = stream_context_create($context_options);
$result = fopen('http://localhost:5000/api/extraction','r', false, $context);

You'd get the file data on your Python server in base64 encoded string on the field named "upload_file". You need to decode to get the original binary content.

Approach 3: If you insist...

If you insist on your current 2-roundtrip approach, the simple solution is to have 2 different endpoints:

  • One for sending POST request to your Python application.
  • One for serving the xlsx file without any requirement to the Python application.

From your description, your deadlock is there because you're using the same script for these purpose. I don't see a reason why they can't be 2 separated script / route controller.

Koala Yeung
  • 7,475
  • 3
  • 30
  • 50
  • I used approach 1 and it works great. Thank you very much! I should have thought about it but I couldn't have done it without your help. – Snakies Dec 09 '20 at 12:21