39

I can't seem to find a real answer to this problem so here I go:

How do you parse raw HTTP request data in multipart/form-data format in PHP? I know that raw POST is automatically parsed if formatted correctly, but the data I'm referring to is coming from a PUT request, which is not being parsed automatically by PHP. The data is multipart and looks something like:

------------------------------b2449e94a11c
Content-Disposition: form-data; name="user_id"

3
------------------------------b2449e94a11c
Content-Disposition: form-data; name="post_id"

5
------------------------------b2449e94a11c
Content-Disposition: form-data; name="image"; filename="/tmp/current_file"
Content-Type: application/octet-stream

�����JFIF���������... a bunch of binary data

I'm sending the data with libcurl like so (pseudo code):

curl_setopt_array(
  CURLOPT_POSTFIELDS => array(
    'user_id' => 3, 
    'post_id' => 5, 
    'image' => '@/tmp/current_file'),
  CURLOPT_CUSTOMREQUEST => 'PUT'
  );

If I drop the CURLOPT_CUSTOMREQUEST bit, the request is handled as a POST on the server and everything is parsed just fine.

Is there a way to manually invoke PHPs HTTP data parser or some other nice way of doing this? And yes, I have to send the request as PUT :)

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
Christof
  • 3,777
  • 4
  • 37
  • 49
  • http://php.net/manual/en/function.http-parse-headers.php – RobertPitt Mar 30 '11 at 08:48
  • Take a look at the python version of this question for some ideas: [How do I deal with the uploaded file data manually?](https://stackoverflow.com/questions/57483402/how-do-i-deal-with-the-uploaded-file-data-manually). Basically you just need to split the binary data, regroup them and rebuild the original file. – Rick Aug 14 '19 at 09:05
  • For parsing a simple PDF form, try https://stackoverflow.com/questions/46515906/receiving-pdf-form-data-into-php/58350678#58350678 – Bilbo Oct 12 '19 at 04:06

8 Answers8

34

Edit - please read first: this answer is still getting regular hits 7 years later. I have never used this code since then and do not know if there is a better way to do it these days. Please view the comments below and know that there are many scenarios where this code will not work. Use at your own risk.

--

Ok, so with Dave and Everts suggestions I decided to parse the raw request data manually. I didn't find any other way to do this after searching around for about a day.

I got some help from this thread. I didn't have any luck tampering with the raw data like they do in the referenced thread, as that will break the files being uploaded. So it's all regex. This wasnt't tested very well, but seems to be working for my work case. Without further ado and in the hope that this may help someone else someday:

function parse_raw_http_request(array &$a_data)
{
  // read incoming data
  $input = file_get_contents('php://input');
  
  // grab multipart boundary from content type header
  preg_match('/boundary=(.*)$/', $_SERVER['CONTENT_TYPE'], $matches);
  $boundary = $matches[1];
  
  // split content by boundary and get rid of last -- element
  $a_blocks = preg_split("/-+$boundary/", $input);
  array_pop($a_blocks);
      
  // loop data blocks
  foreach ($a_blocks as $id => $block)
  {
    if (empty($block))
      continue;
    
    // you'll have to var_dump $block to understand this and maybe replace \n or \r with a visibile char
    
    // parse uploaded files
    if (strpos($block, 'application/octet-stream') !== FALSE)
    {
      // match "name", then everything after "stream" (optional) except for prepending newlines 
      preg_match('/name=\"([^\"]*)\".*stream[\n|\r]+([^\n\r].*)?$/s', $block, $matches);
    }
    // parse all other fields
    else
    {
      // match "name" and optional value in between newline sequences
      preg_match('/name=\"([^\"]*)\"[\n|\r]+([^\n\r].*)?\r$/s', $block, $matches);
    }
    $a_data[$matches[1]] = $matches[2];
  }        
}

Usage by reference (in order not to copy around the data too much):

$a_data = array();
parse_raw_http_request($a_data);
var_dump($a_data);
Christof
  • 3,777
  • 4
  • 37
  • 49
  • 3
    This function won't work if the post variables contain arrays. For example, a name of "value[id]" will not parse properly. Content-Disposition: form-data; name="elements[_itemname][value]" Content-Disposition: form-data; name="array[value]" -- neither would work with this. – Rob Porter Jul 17 '13 at 02:37
  • That's true. I didn't need nested arrays in my case. – Christof Jul 17 '13 at 07:57
  • Thank you. This helps me a lot. Just modified to separate header/content by the two line feeds in between those parts instead of the Content-Type. I think this covers standards better – Alwin Kesler Jun 14 '16 at 15:01
  • 1
    @Chris I made a modified version to cover nested array, here's the code https://gist.github.com/cwhsu1984/3419584ad31ce12d2ad5fed6155702e2 – cwhsu Apr 17 '17 at 13:18
  • Parsing HTTP data is unfortunately much more complex than what this code does. It might work in some cases but not in many others. For example, there can be multiple lines before the actual content (such as "Content-Length: XXX", which this code does not handle. The number of dashes for the boundary might vary between CONTENT_TYPE and what's in the input. Also the code does not handle keys that are present but have no values. – laurent May 19 '17 at 11:51
  • When receiving data with negative integers as values, this function breaks. It needs a check whether `$_SERVER['CONTENT_TYPE']` is set, if not - return an empty array. Also at the last lines, `$matches` can have no indexes `1` and `2` defined sometimes. It needs a `if(count($matches))` check. It's still far from perfect but for me it's enough. – Lis May 22 '21 at 17:15
7

I used Chris's example function and added some needed functionality, such as R Porter's need for array's of $_FILES. Hope it helps some people.

Here is the class & example usage

<?php
include_once('class.stream.php');

$data = array();

new stream($data);

$_PUT = $data['post'];
$_FILES = $data['file'];

/* Handle moving the file(s) */
if (count($_FILES) > 0) {
    foreach($_FILES as $key => $value) {
        if (!is_uploaded_file($value['tmp_name'])) {
            /* Use getimagesize() or fileinfo() to validate file prior to moving here */
            rename($value['tmp_name'], '/path/to/uploads/'.$value['name']);
        } else {
            move_uploaded_file($value['tmp_name'], '/path/to/uploads/'.$value['name']);
        }
    }
}
Community
  • 1
  • 1
jas-
  • 1,801
  • 1
  • 18
  • 30
2

I'm surprised no one mentioned parse_str or mb_parse_str:

$result = [];
$rawPost = file_get_contents('php://input');
mb_parse_str($rawPost, $result);
var_dump($result);

http://php.net/manual/en/function.mb-parse-str.php

Mahn
  • 16,261
  • 16
  • 62
  • 78
  • 6
    I guess this doesn't work for me because I'm using binary files in the form with `multipart/form-data` Content-Type. FWMC – Alwin Kesler Jun 14 '16 at 12:34
  • 8
    The question was specifically about requests with MIME type `multipart/form-data`, not `application/x-www-form-urlencoded`, which is what `parse_str()` is intended for. – miken32 Nov 15 '18 at 19:03
2

I would suspect the best way to go about it is 'doing it yourself', although you might find inspiration in multipart email parsers that use a similar (if not the exact same) format.

Grab the boundary from the Content-Type HTTP header, and use that to explode the various parts of the request. If the request is very large, keep in mind that you might store the entire request in memory, possibly even multiple times.

The related RFC is RFC2388, which fortunately is pretty short.

Smi
  • 13,850
  • 9
  • 56
  • 64
Evert
  • 93,428
  • 18
  • 118
  • 189
  • Hm, thats what Dave Kok wrote too. I guess I will have to check that out. Thing is, my request content doesn't look quite the way I'd expect it with Content-Type boundaries. I pasted a bit of it in my initial question. Would you happen to know why it looks that way? – Christof Mar 30 '11 at 11:37
  • The actual boundary is listed not in the per-part headers, but in the top header. So this won't be accessible through php://input, but like dave mentioned, it should be in the $_SERVER['HTTP_CONTENT_TYPE'] or $_SERVER['CONTENT_TYPE'] property. – Evert Mar 30 '11 at 11:52
0

I haven't dealt with http headers much, but found this bit of code that might help

function http_parse_headers( $header )
{
    $retVal = array();
    $fields = explode("\r\n", preg_replace('/\x0D\x0A[\x09\x20]+/', ' ', $header));
    foreach( $fields as $field ) {
        if( preg_match('/([^:]+): (.+)/m', $field, $match) ) {
            $match[1] = preg_replace('/(?<=^|[\x09\x20\x2D])./e', 'strtoupper("\0")', strtolower(trim($match[1])));
            if( isset($retVal[$match[1]]) ) {
                $retVal[$match[1]] = array($retVal[$match[1]], $match[2]);
            } else {
                $retVal[$match[1]] = trim($match[2]);
            }
        }
    }
    return $retVal;
}

From http://php.net/manual/en/function.http-parse-headers.php

Jess
  • 8,628
  • 6
  • 49
  • 67
  • Thanks. I saw that function earlier today but the result isn't of much use. Have you used that function with success? – Christof Mar 30 '11 at 08:54
0

Here is a universal solution working with arbitrary multipart/form-data content and tested for POST, PUT, and PATCH:

/**
* Parse arbitrary multipart/form-data content
* Note: null result or null values for headers or value means error
* @return array|null [{"headers":array|null,"value":string|null}]
* @param string|null $boundary
* @param string|null $content
*/
function parse_multipart_content(?string $content, ?string $boundary): ?array {
  if(empty($content) || empty($boundary)) return null;
  $sections = array_map("trim", explode("--$boundary", $content));
  $parts = [];
  foreach($sections as $section) {
    if($section === "" || $section === "--") continue;
    $fields = explode("\r\n\r\n", $section);
    if(preg_match_all("/([a-z0-9-_]+)\s*:\s*([^\r\n]+)/iu", $fields[0] ?? "", $matches, PREG_SET_ORDER) === 2) {
      $headers = [];
      foreach($matches as $match) $headers[$match[1]] = $match[2];
    } else $headers = null;
    $parts[] = ["headers" => $headers, "value"   => $fields[1] ?? null];
  }
  return empty($parts) ? null : $parts;
}
Oleg Uryutin
  • 413
  • 5
  • 15
0

Update
The function was updated to support arrays in form fields. That is fields like level1[level2] will be translated into proper (multidimensional) arrays.

I've just added a small function to my HTTP20 library, that can help with this. It is made to parse form data for PUT, DELETE and PATCH and add it to respective static variable to simulate $_POST global.
For now it's just for text fields, though, no binary support, since I currently do not have a good use case in my project to properly test it and I'd prefer not to share something I can't test extensively. But if I do get to it at some point - I will update this answer.
Here is the code:

public function multiPartFormParse(): void
    {
        #Get method
        $method = $_SERVER['HTTP_ACCESS_CONTROL_REQUEST_METHOD'] ?? $_SERVER['REQUEST_METHOD'] ?? null;
        #Get Content-Type
        $contentType = $_SERVER['CONTENT_TYPE'] ?? '';
        #Exit if not one of the supported methods or wrong content-type
        if (!in_array($method, ['PUT', 'DELETE', 'PATCH']) || preg_match('/^multipart\/form-data; boundary=.*$/ui', $contentType) !== 1) {
            return;
        }
        #Get boundary value
        $boundary = preg_replace('/(^multipart\/form-data; boundary=)(.*$)/ui', '$2', $contentType);
        #Get input stream
        $formData = file_get_contents('php://input');
        #Exit if failed to get the input or if it's not compliant with the RFC2046
        if ($formData === false || preg_match('/^\s*--'.$boundary.'.*\s*--'.$boundary.'--\s*$/muis', $formData) !== 1) {
            return;
        }
        #Strip ending boundary
        $formData = preg_replace('/(^\s*--'.$boundary.'.*)(\s*--'.$boundary.'--\s*$)/muis', '$1', $formData);
        #Split data into array of fields
        $formData = preg_split('/\s*--'.$boundary.'\s*Content-Disposition: form-data;\s*/muis', $formData, 0, PREG_SPLIT_NO_EMPTY);
        #Convert to associative array
        $parsedData = [];
        foreach ($formData as $field) {
            $name =  preg_replace('/(name=")(?<name>[^"]+)("\s*)(?<value>.*$)/mui', '$2', $field);
            $value =  preg_replace('/(name=")(?<name>[^"]+)("\s*)(?<value>.*$)/mui', '$4', $field);
            #Check if we have multiple keys
            if (str_contains($name, '[')) {
                #Explode keys into array
                $keys = explode('[', trim($name));
                $name = '';
                #Build JSON array string from keys
                foreach ($keys as $key) {
                    $name .= '{"' . rtrim($key, ']') . '":';
                }
                #Add the value itself (as string, since in this case it will always be a string) and closing brackets
                $name .= '"' . trim($value) . '"' . str_repeat('}', count($keys));
                #Convert into actual PHP array
                $array = json_decode($name, true);
                #Check if we actually got an array and did not fail
                if (!is_null($array)) {
                    #"Merge" the array into existing data. Doing recursive replace, so that new fields will be added, and in case of duplicates, only the latest will be used
                    $parsedData = array_replace_recursive($parsedData, $array);
                }
            } else {
                #Single key - simple processing
                $parsedData[trim($name)] = trim($value);
            }
        }
        #Update static variable based on method value
        self::${'_'.strtoupper($method)} = $parsedData;
    }

Obviously you can safely remove method check and assignment to a static, if you do not those.

Simbiat
  • 339
  • 2
  • 12
-1

Have you looked at fopen("php://input", "r") for parsing the content?

Headers can also be found as $_SERVER['HTTP_*'], names are always uppercased and dashes become underscores, eg $_SERVER['HTTP_ACCEPT_LANGUAGE'].

Dave Kok
  • 892
  • 9
  • 19
  • 2
    fopen('php://input') would only read the content, not parse it? The values I'm hoping to parse are not in the $_SERVER variable. – Christof Mar 30 '11 at 08:58
  • How about using mod_rewrite to redirect it as a POST – Dave Kok Mar 30 '11 at 09:04
  • nevermind, got confused with the R flag which does only codes. But you could redirect it with PHP by reconstructing the HTTP request but modify it to be a POST request and call another script to parse the request. – Dave Kok Mar 30 '11 at 09:15
  • How would you rewrite the request to be a POST? This would have to occur on the server. – Christof Mar 30 '11 at 09:22
  • Well, you could open a socket to the server on port 80 and feed it the request. The response can be send back to the client with readfile. Do add a Connection: close header to close the connection after the request has been processed. – Dave Kok Mar 30 '11 at 09:28
  • Yeah or I guess I could even use another curl_client, dump the PUT request into memory and resend it as a POST. The thing is, I'm using this in a REST server context, so the idea is to actually have a put request and to process it accordingly. If there's an actual solution to manually parsing the request data, I'd really prefer that. Thanks though. – Christof Mar 30 '11 at 09:35
  • Well, multipart HTTP messages are a lot like multipart mail messages. Maybe you could use a MIME decoder on the php://input stream. – Dave Kok Mar 30 '11 at 09:39
  • I don't think there's MIME decoder for HTTP content. I guess I could rewrite one, then again I guess I could write my own parser for the HTTP data, but that gets messy. I was hoping I wouldn't have to.. – Christof Mar 30 '11 at 09:55