You should send the file through proper POST (multipart/form) request instead of having Python fetching the data. It's much harder to debug and maintain than your current 2-roundtrip approach.
Approach 1: Normal Form Request
<?php
/**
* A genertor that yields multipart form-data fragments (without the ending EOL).
* Would encode all files with base64 to make the request binary-safe.
*
* @param iterable $vars
* Key-value iterable (e.g. assoc array) of string or integer.
* Keys represents the field name.
* @param iterable $files
* Key-value iterable (e.g. assoc array) of file path string.
* Keys represents the field name of file upload.
*
* @return \Generator
* Generator of multipart form-data fragments (without the ending EOL) in array format,
* always contains 2 values:
* 0 - An array of header for a key-value pair.
* 1 - A value string (can contain binary content) of the key-value pair.
*/
function generate_multipart_data_parts(iterable $vars, iterable $files=[]): Generator {
// handle normal variables
foreach ($vars as $name => $value) {
$name = urlencode($name);
$value = urlencode($value);
yield [
// header
["Content-Disposition: form-data; name=\"{$name}\""],
// value
$value,
];
}
// handle file contents
foreach ($files as $file_fieldname => $file_path) {
$file_fieldname = urlencode($file_fieldname);
$file_data = file_get_contents($file_path);
yield [
// header
[
"Content-Disposition: form-data; name=\"{$file_fieldname}\"; filename=\"".basename($file_path)."\"",
"Content-Type: application/octet-stream", // for binary safety
],
// value
$file_data
];
}
}
/**
* Converts output of generate_multipart_data_parts() into form data.
*
* @param iterable $parts
* An iterator of form fragment arrays. See return data of
* generate_multipart_data_parts().
* @param string|null $boundary
* An optional pre-generated boundary string to use for wrapping data.
* Please reference section 7.2 "The Multipart Content-Type" in RFC1341.
*
* @return array
* An array with 2 items:
* 0 - string boundary
* 1 - string (can container binary data) data
*/
function wrap_multipart_data(iterable $parts, ?string $boundary = null): array {
if (empty($boundary)) {
$boundary = '-----------------------------------------boundary' . time();
}
$data = '';
foreach ($parts as $part) {
list($header, $content) = $part;
// Check content for boundary.
// Note: Won't check header and expect the program makes sense there.
if (strstr($content, "\r\n$boundary") !== false) {
throw new \Exception('Error: data contains the multipart boundary');
}
$data .= "--{$boundary}\r\n";
$data .= implode("\r\n", $header) . "\r\n\r\n" . $content . "\r\n";
}
// signal end of request (note the trailing "--")
$data .= "--{$boundary}--\r\n";
return [$boundary, $data];
}
// build data for a multipart/form-data request
list($boundary, $data) = wrap_multipart_data(generate_multipart_data_parts(
// normal form variables
[
'hello' => 'world',
'foo' => 'bar',
],
// files
[
'upload_file' => 'path/to/your/file.xlsx',
]
));
// Send POST REQUEST
$context_options = array(
'http' => array(
'method' => 'POST',
'header' => "Content-type: multipart/form-data; boundary={$boundary}\r\n"
. "Content-Length: " . strlen($data) . "\r\n",
'content' => $data,
'timeout' => 10,
)
);
$context = stream_context_create($context_options);
$result = fopen('http://localhost:5000/api/extraction','r', false, $context);
Your Python script should receive the file as a normal HTTP form file upload (with the file field named "upload_file"). Use your framework supported method to get the file from the request.
Approach 2: A really long x-www-form-urlencoded value
If you're concern about binary safety, or if it somehow failed, the other approach would be submitting the file as a base64 encoded string:
<?php
$file_data = file_get_contents('/some');
$data = urlencode([
'upload_file' => base64_encode('path/to/your/file.xlsx'),
]);
// Send POST REQUEST
$context_options = array(
'http' => array(
'method' => 'POST',
'header' => "Content-type: application/x-www-form-urlencoded\r\n"
. "Content-Length: " . strlen($data) . "\r\n",
'content' => $data,
'timeout' => 10,
)
);
$context = stream_context_create($context_options);
$result = fopen('http://localhost:5000/api/extraction','r', false, $context);
You'd get the file data on your Python server in base64 encoded string on the field named "upload_file"
. You need to decode to get the original binary content.
Approach 3: If you insist...
If you insist on your current 2-roundtrip approach, the simple solution is to have 2 different endpoints:
- One for sending POST request to your Python application.
- One for serving the xlsx file without any requirement to the Python application.
From your description, your deadlock is there because you're using the same script for these purpose. I don't see a reason why they can't be 2 separated script / route controller.