Home » Php » http – Manually parse raw multipart/form-data data with PHP

http – Manually parse raw multipart/form-data data with PHP

Posted by: admin April 23, 2020 Leave a comment

Questions:

I can’t seem to find a real answer to this problem so here I go:

How do you parse raw HTTP request data in multipart/form-data format in PHP? I know that raw POST is automatically parsed if formatted correctly, but the data I’m referring to is coming from a PUT request, which is not being parsed automatically by PHP. The data is multipart and looks something like:

------------------------------b2449e94a11c
Content-Disposition: form-data; name="user_id"

3
------------------------------b2449e94a11c
Content-Disposition: form-data; name="post_id"

5
------------------------------b2449e94a11c
Content-Disposition: form-data; name="image"; filename="/tmp/current_file"
Content-Type: application/octet-stream

�����JFIF���������... a bunch of binary data

I’m sending the data with libcurl like so (pseudo code):

curl_setopt_array(
  CURLOPT_POSTFIELDS => array(
    'user_id' => 3, 
    'post_id' => 5, 
    'image' => '@/tmp/current_file'),
  CURLOPT_CUSTOMREQUEST => 'PUT'
  );

If I drop the CURLOPT_CUSTOMREQUEST bit, the request is handled as a POST on the server and everything is parsed just fine.

Is there a way to manually invoke PHPs HTTP data parser or some other nice way of doing this?
And yes, I have to send the request as PUT 🙂

How to&Answers:

Edit – please read first: this answer is still getting regular hits 7 years later. I have never used this code since then and do not know if there is a better way to do it these days. Please view the comments below and know that there are many scenarios where this code will not work. Use at your own risk.

Ok, so with Dave and Everts suggestions I decided to parse the raw request data manually. I didn’t find any other way to do this after searching around for about a day.

I got some help from this thread. I didn’t have any luck tampering with the raw data like they do in the referenced thread, as that will break the files being uploaded. So it’s all regex. This wasnt’t tested very well, but seems to be working for my work case. Without further ado and in the hope that this may help someone else someday:

function parse_raw_http_request(array &$a_data)
{
  // read incoming data
  $input = file_get_contents('php://input');

  // grab multipart boundary from content type header
  preg_match('/boundary=(.*)$/', $_SERVER['CONTENT_TYPE'], $matches);
  $boundary = $matches[1];

  // split content by boundary and get rid of last -- element
  $a_blocks = preg_split("/-+$boundary/", $input);
  array_pop($a_blocks);

  // loop data blocks
  foreach ($a_blocks as $id => $block)
  {
    if (empty($block))
      continue;

    // you'll have to var_dump $block to understand this and maybe replace \n or \r with a visibile char

    // parse uploaded files
    if (strpos($block, 'application/octet-stream') !== FALSE)
    {
      // match "name", then everything after "stream" (optional) except for prepending newlines 
      preg_match("/name=\"([^\"]*)\".*stream[\n|\r]+([^\n\r].*)?$/s", $block, $matches);
    }
    // parse all other fields
    else
    {
      // match "name" and optional value in between newline sequences
      preg_match('/name=\"([^\"]*)\"[\n|\r]+([^\n\r].*)?\r$/s', $block, $matches);
    }
    $a_data[$matches[1]] = $matches[2];
  }        
}

Usage by reference (in order not to copy around the data too much):

$a_data = array();
parse_raw_http_request($a_data);
var_dump($a_data);

Answer:

I’m surprised no one mentioned parse_str or mb_parse_str:

$result = [];
$rawPost = file_get_contents('php://input');
mb_parse_str($rawPost, $result);
var_dump($result);

http://php.net/manual/en/function.mb-parse-str.php

Answer:

I used Chris‘s example function and added some needed functionality, such as R Porter‘s need for array’s of $_FILES. Hope it helps some people.

Here is the class & example usage

<?php
include_once('class.stream.php');

$data = array();

new stream($data);

$_PUT = $data['post'];
$_FILES = $data['file'];

/* Handle moving the file(s) */
if (count($_FILES) > 0) {
    foreach($_FILES as $key => $value) {
        if (!is_uploaded_file($value['tmp_name'])) {
            /* Use getimagesize() or fileinfo() to validate file prior to moving here */
            rename($value['tmp_name'], '/path/to/uploads/'.$value['name']);
        } else {
            move_uploaded_file($value['tmp_name'], '/path/to/uploads/'.$value['name']);
        }
    }
}

Answer:

I would suspect the best way to go about it is ‘doing it yourself’, although you might find inspiration in multipart email parsers that use a similar (if not the exact same) format.

Grab the boundary from the Content-Type HTTP header, and use that to explode the various parts of the request. If the request is very large, keep in mind that you might store the entire request in memory, possibly even multiple times.

The related RFC is RFC2388, which fortunately is pretty short.

Answer:

I haven’t dealt with http headers much, but found this bit of code that might help

function http_parse_headers( $header )
{
    $retVal = array();
    $fields = explode("\r\n", preg_replace('/\x0D\x0A[\x09\x20]+/', ' ', $header));
    foreach( $fields as $field ) {
        if( preg_match('/([^:]+): (.+)/m', $field, $match) ) {
            $match[1] = preg_replace('/(?<=^|[\x09\x20\x2D])./e', 'strtoupper("
function http_parse_headers( $header ) { $retVal = array(); $fields = explode("\r\n", preg_replace('/\x0D\x0A[\x09\x20]+/', ' ', $header)); foreach( $fields as $field ) { if( preg_match('/([^:]+): (.+)/m', $field, $match) ) { $match[1] = preg_replace('/(?<=^|[\x09\x20\x2D])./e', 'strtoupper("\0")', strtolower(trim($match[1]))); if( isset($retVal[$match[1]]) ) { $retVal[$match[1]] = array($retVal[$match[1]], $match[2]); } else { $retVal[$match[1]] = trim($match[2]); } } } return $retVal; } 
")', strtolower(trim($match[1]))); if( isset($retVal[$match[1]]) ) { $retVal[$match[1]] = array($retVal[$match[1]], $match[2]); } else { $retVal[$match[1]] = trim($match[2]); } } } return $retVal; }

From http://php.net/manual/en/function.http-parse-headers.php

Answer:

Have you looked at fopen("php://input") for parsing the content?

Headers can also be found as $_SERVER['HTTP_*'], names are always uppercased and dashes become underscores, eg $_SERVER['HTTP_ACCEPT_LANGUAGE'].