9

The list of backwards-incompatible changes for PHP 7.4 contain the following note:

Serialization

The o serialization format has been removed. As it is never produced by PHP, this may only break unserialization of manually crafted strings.

(Note that this is referring to a little-o, not the big-O format which is used for object serialization.)

It seems this was never generated by PHP's serialize() function, but the fact that this note exists implies that it was recognised by the unserialize() function.

I've done a little test fiddle (3v4l.org) which shows this was not simply a synonym for big-O, which would be one obvious possibility.

The fiddle exposes the changes in PHP by the differences in the error message that is output. In PHP >= 7.4 we get an error at position 0 (where the o is encountered) whereas prior to 7.4 the error was reported at position 5 (where the data is located). This implies that o was recognised but the data is in the wrong format, which ties in with what I've already deduced, above.

So, what was the o serialization format, what did it deserialize to and why did PHP support such a feature if it didn't actually generate it, itself?

Community
  • 1
  • 1
HappyDog
  • 1,230
  • 1
  • 18
  • 45
  • Related? http://www.phpinternalsbook.com/php5/classes_objects/serialization.html – Caramiriel Dec 14 '20 at 13:26
  • @Caramiriel No. I am familiar with that page - it is a great resource about the internals of PHP serialization - but It doesn't mention the little-`o` notation at all. – HappyDog Dec 14 '20 at 13:36

1 Answers1

18

Originally, PHP 3 used o:<num_fields>:{<fields>} to serialize objects.

The following program works in PHP 4.0.0, which can be downloaded from php.net/releases/index.php (the Windows binary still works on Windows 10!):

<?php

var_dump(unserialize('o:0:{}'));

Output:

X-Powered-By: PHP/4.0.0
Content-type: text/html

object(stdClass)(0) {
}

I was able to trace the original implementation of the object serialization format to this commit in 1999. See php3api_var_serialize.

Later that year, the object serialization format was changed to include the classname of the object being serialized in preparation for PHP 4. This commit changed the serialization format to o:<classname_length>:"<class_name>":<num_fields>:{<fields>}

This made the output of PHP3 and PHP4 incompatible: PHP4 would not have been able to unserialize objects serialized with PHP3. Therefore, another commit was added that changed o to O (lowercase o to uppercase O). o was still supported by unserialize() to unserialize objects serialized with PHP3, but serialize() did not use o anymore.

In 2000, the serialization/unserialization code was refactored, resulting in the file we see today.

What probably happened is that the compatibility layer broke somewhere along the way, and no-one cared enough about PHP3 compatibility to fix it. The code in the beginning no longer works with any PHP version released in the last 15 years.

Pieter van den Ham
  • 4,381
  • 3
  • 26
  • 41
  • This is a really useful answer, that points to the details about what the little-o notation was used for. It doesn't answer the 'why' part of the question, but perhaps that is lost in the mists of time. I will leave this open for a little while, but if no-one is able to provide any further information I will probably accept this as the answer. – HappyDog Jan 19 '21 at 12:24
  • @HappyDog I updated the answer after diving into the old PHP codebase. Still not sure if I answered your question, but it should be close now :) – Pieter van den Ham Jan 22 '21 at 16:49
  • 5
    Wow - with your update, this is an amazing answer! So much detail, and exactly the kind of historical information I was looking for. In fact, it's so good, I'm going to award a bounty for it, even though your answer is already posted! – HappyDog Jan 24 '21 at 15:50
  • Bounty awarded. I also added a note to the PHP migration notes, referencing your answer: https://www.php.net/manual/en/migration74.incompatible.php#125717 – HappyDog Jan 25 '21 at 22:50