PHP unserialize fails with non-encoded characters?

Question

$ser = 'a:2:{i:0;s:5:"héllö";i:1;s:5:"wörld";}'; // fails
$ser2 = 'a:2:{i:0;s:5:"hello";i:1;s:5:"world";}'; // works
$out = unserialize($ser);
$out2 = unserialize($ser2);
print_r($out);
print_r($out2);
echo "<hr>";

But why?
Should I encode before serialzing than? How?

I am using Javascript to write the serialized string to a hidden field, than PHP's $_POST
In JS I have something like:

function writeImgData() {
    var caption_arr = new Array();
    $('.album img').each(function(index) {
         caption_arr.push($(this).attr('alt'));
    });
    $("#hidden-field").attr("value", serializeArray(caption_arr));
};

score 58 · Accepted Answer · answered May 17 '10 at 23:33

58

The reason why unserialize() fails with:

$ser = 'a:2:{i:0;s:5:"héllö";i:1;s:5:"wörld";}';

Is because the length for héllö and wörld are wrong, since PHP doesn't correctly handle multi-byte strings natively:

echo strlen('héllö'); // 7
echo strlen('wörld'); // 6

However if you try to unserialize() the following correct string:

$ser = 'a:2:{i:0;s:7:"héllö";i:1;s:6:"wörld";}';

echo '<pre>';
print_r(unserialize($ser));
echo '</pre>';

It works:

Array
(
    [0] => héllö
    [1] => wörld
)

If you use PHP serialize() it should correctly compute the lengths of multi-byte string indexes.

On the other hand, if you want to work with serialized data in multiple (programming) languages you should forget it and move to something like JSON, which is way more standardized.

answered May 17 '10 at 23:33

Alix Axel

151,645
95
393
500

1

json_encode: "This function only works with UTF-8 encoded data..." http://php.net/manual/en/function.json-encode.php – giorgio79 Jun 26 '12 at 20:09
1

and in the case where you're using serialize( ) and unserialize( ) is still failing, check your storage medium. i.e. mysql you should store as binary or blob. If you store as text in mysql it won't handle your multibyte chars. – Dev Null Mar 25 '16 at 19:15
Also be careful when switching between php environments. I ran into issues encoding on a local machine before saving to the database and then trying to unserialize on the live server. Adjusting character counts for the characters solved the problem. – Coyote6 Apr 19 '16 at 15:45
This is probably also the answer to a problem I had about two years ago and never found an answer to. http://stackoverflow.com/questions/30289218/visually-same-string-gives-different-var-dumps-in-php – Marc van Nieuwenhuijzen Nov 10 '16 at 09:49

score 52 · Answer 2 · answered Apr 28 '11 at 02:50

52

I know this was posted like one year ago, but I just have this issue and come across this, and in fact I found a solution for it. This piece of code works like charm!

The idea behind is easy. It's just helping you by recalculating the length of the multibyte strings as posted by @Alix above.

A few modifications should suits your code:

/**
 * Mulit-byte Unserialize
 *
 * UTF-8 will screw up a serialized string
 *
 * @access private
 * @param string
 * @return string
 */
function mb_unserialize($string) {
    $string = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $string);
    return unserialize($string);
}

Source: http://snippets.dzone.com/posts/show/6592

Tested on my machine, and it works like charm!!

answered Apr 28 '11 at 02:50

Lionel Chan

7,894
5
40
69

in my case the problem was in database encoding, so i lost part of my data in `???`, but this function helps me to make code work even with this, thanks – llamerr Apr 27 '12 at 10:37
Just saved me a massive headache! Thanks. – Damien Roche Dec 04 '12 at 18:44
+1 for this very useful work. I tested it as well and it works for me on UTF-8 data with French accents (PHP 5.3 on my server). – Sébastien Apr 11 '13 at 13:56
@Sébastien Check out another answer here that points out that there might be issue with this approach ;) – Lionel Chan Apr 12 '13 at 04:09
Good catch, I would not have spotted that. +1 for @Joe-Hong as well. Is there a way to check and correct for that? – Sébastien Apr 12 '13 at 12:33
Note that the `e` modifier is going away, time to switch to preg_replace_callback. – Alix Axel Jul 02 '13 at 20:01
In addition to the "e" modifier going away, this will fail on any string that was serialized containing the end of the regexp search (";) – Doug Kress Sep 17 '13 at 20:18
This function just saved my day! Thanks for sharing! – Jerome Bohg Aug 14 '14 at 19:49
3

I've post below your function changed to work with PHP 5.5. Thanks for your useful contribution. – David Jan 13 '15 at 14:32
1

Actually the Regular Expression is wrong, as the string itself may include the pattern that is not related to the serialization schema. E.g. The serialized part `...s:28:"some "quotes"; in the middle";...` after your function will return `...s:13:"some \"quotes"; in the middle";...`. That's one of the reasons the serializations has been created at the first place. – Slavik Meltser Apr 08 '17 at 08:05
AWSOME @lionel-chan I was so tensed thinking all data went corrupt. You saved my life......thank you so much......:):):). THIS SHOWULD HAVE BEEN CORRECT ACCEPTED ANSWER – Parag Feb 18 '18 at 09:29

David · Answer 3 · 2017-12-18T18:29:28.297

36

Lionel Chan answer modified to work with PHP >= 5.5 :

function mb_unserialize($string) {
    $string2 = preg_replace_callback(
        '!s:(\d+):"(.*?)";!s',
        function($m){
            $len = strlen($m[2]);
            $result = "s:$len:\"{$m[2]}\";";
            return $result;

        },
        $string);
    return unserialize($string2);
}

This code uses preg_replace_callback as preg_replace with the /e modifier is obsolete since PHP 5.5.

edited Dec 18 '17 at 18:29

answered Jan 13 '15 at 14:30

David

2,942
33
16

I had to use this version to prevent HTML strings in encoded arrays from getting incorrectly escaped double quotes in unserialized strings. – fideloper Nov 02 '15 at 14:46
1

A million thanks @David. I've been struggling with converting this function for many days now! – Ifedi Okonkwo Nov 27 '17 at 14:55

score 10 · Answer 4 · edited May 24 '12 at 13:36

The issue is - as pointed out by Alix - related to encoding.

Until PHP 5.4 the internal encoding for PHP was ISO-8859-1, this encoding uses a single byte for some characters that in unicode are multibyte. The result is that multibyte values serialized on UTF-8 system will not be readable on ISO-8859-1 systems.

The avoid problems like this make sure all systems use the same encoding:

mb_internal_encoding('utf-8');
$arr = array('foo' => 'bár');
$buf = serialize($arr);

You can use utf8_(encode|decode) to cleanup:

// Set system encoding to iso-8859-1
mb_internal_encoding('iso-8859-1');
$arr = unserialize(utf8_encode($serialized));
print_r($arr);

score 3 · Answer 5 · answered Apr 02 '12 at 03:33

3

In reply to @Lionel above, in fact the function mb_unserialize() as you proposed won't work if the serialized string itself contains char sequence "; (quote followed by semicolon). Use with caution. For example:

$test = 'test";string'; 
// $test is now 's:12:"test";string";'
$string = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $test);
print $string; 
// output: s:4:"test";string";  (Wrong!!)

JSON is the ways to go, as mentioned by others, IMHO

Note: I post this as new answer as I don't know how to reply directly (new here).

answered Apr 02 '12 at 03:33

Joe Hong

71
2

You'll be able to reply with comments soon. Keep contributing! Cheers~ – Andrew Kozak Apr 02 '12 at 15:36
Good to know. Is there a solution? – Tyler Collier Dec 04 '20 at 18:59

score 2 · Answer 6 · answered May 17 '10 at 23:11

2

Do not use PHP serialization/unserialization when the other end is not PHP. It is not meant to be a portable format - for example, it even includes ascii-1 characters for protected keys which is nothing you want to deal with in javascript (even though it would work perfectly fine, it's just extremely ugly).

Instead, use a portable format like JSON. XML would do the job, too, but JSON has less overhead and is more programmer-friendly as you can easily parse it into a simple data structure instead of having to deal with XPath, DOM trees etc.

answered May 17 '10 at 23:11

ThiefMaster

310,957
84
592
636

Not to mention unserializing from untrusted sources can cause arbitrary code execution. – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ May 17 '10 at 23:35
Unfortunately the choice has been imposed on us by someone else's work. This is particularly common when importing data from an older project/system whereby serialisation is already well established in its data. – Adambean Oct 09 '20 at 12:08

score 2 · Answer 7 · answered Sep 17 '20 at 12:31

2

This solution worked for me:

$unserialized = unserialize(utf8_encode($st));

answered Sep 17 '20 at 12:31

Артур Димерчан

43
5

score 1 · Answer 8 · answered Apr 12 '13 at 19:47

One more slight variation here which will hopefully help someone ... I was serializing an array then writing it to a database. On retrieving the data the unserialize operation was failing.

It turns out that the database longtext field I was writing into was using latin1 not UTF8. When I switched it round everything worked as planned.

Thanks to all above who mentioned character encoding and got me on the right track!

score 0 · Answer 9 · answered Oct 10 '14 at 16:46

/**
 * MULIT-BYTE UNSERIALIZE
 *
 * UTF-8 will screw up a serialized string
 *
 * @param string
 * @return string
 */
function mb_unserialize($string) {
    $string = preg_replace_callback('/!s:(\d+):"(.*?)";!se/', function($matches) { return 's:'.strlen($matches[1]).':"'.$matches[1].'";'; }, $string);
    return unserialize($string);
}

score 0 · Answer 10 · answered May 17 '10 at 22:57

0

I would advise you to use javascript to encode as json and then use json_decode to unserialize.

answered May 17 '10 at 22:57

Artefacto

96,375
17
202
225

that said, $ser = 'a:2:{i:0;s:5:"héllö";i:1;s:5:"wörld";}'; var_dump(unserialize($ser)); works fine with me. What do you mean by fail? The call to unserialize() fails? – Artefacto May 17 '10 at 23:01

score 0 · Answer 11 · answered Oct 03 '16 at 09:47

0

we can break the string down to an array:

$finalArray = array();
$nodeArr = explode('&', $_POST['formData']);

foreach($nodeArr as $value){
    $childArr = explode('=', $value);
    $finalArray[$childArr[0]] = $childArr[1];
}

answered Oct 03 '16 at 09:47

Rondip

1

score 0 · Answer 12 · answered Oct 17 '16 at 11:16

0

Serialize:

foreach ($income_data as $key => &$value)
{
    $value = urlencode($value);
}
$data_str = serialize($income_data);

Unserialize:

$data = unserialize($data_str);
foreach ($data as $key => &$value)
{
    $value = urldecode($value);
}

answered Oct 17 '16 at 11:16

sNICkerssss

6,312
1
24
16

score 0 · Answer 13 · answered Oct 12 '17 at 13:15

this one worked for me.

function mb_unserialize($string) {
    $string = mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
    $string = preg_replace_callback(
        '/s:([0-9]+):"(.*?)";/',
        function ($match) {
            return "s:".strlen($match[2]).":\"".$match[2]."\";"; 
        },
        $string
    );
    return unserialize($string);
}

score 0 · Answer 14 · answered Apr 13 '18 at 07:26

In my case the problem was with line endings (likely some editor have changed my file from DOS to Unix).

I put together these apadtive wrappers:

function unserialize_fetchError($original, &$unserialized, &$errorMsg) {
    $unserialized = @unserialize($original);
    $errorMsg = error_get_last()['message'];
    return ( $unserialized !== false || $original == 'b:0;' );  // "$original == serialize(false)" is a good serialization even if deserialization actually returns false
}

function unserialize_checkAllLineEndings($original, &$unserialized, &$errorMsg, &$lineEndings) {
    if ( unserialize_fetchError($original, $unserialized, $errorMsg) ) {
        $lineEndings = 'unchanged';
        return true;
    } elseif ( unserialize_fetchError(str_replace("\n", "\n\r", $original), $unserialized, $errorMsg) ) {
        $lineEndings = '\n to \n\r';
        return true;
    } elseif ( unserialize_fetchError(str_replace("\n\r", "\n", $original), $unserialized, $errorMsg) ) {
        $lineEndings = '\n\r to \n';
        return true;
    } elseif ( unserialize_fetchError(str_replace("\r\n", "\n", $original), $unserialized, $errorMsg) ) {
        $lineEndings = '\r\n to \n';
        return true;
    } //else
    return false;
}

PHP unserialize fails with non-encoded characters?

14 Answers14

Linked

Related