Welcome! Log In Create A New Profile

Advanced

[PHP-DEV] A replacement for the Serializable interface

Posted by Nikita Popov 
Nikita Popov
[PHP-DEV] A replacement for the Serializable interface
April 21, 2017 11:50AM
Hi internals,

As you are surely aware, serialization in PHP is a big mess. Said mess is
caused by some fundamental issues in the serialization format, and
exacerbated by the existence of the Serializable interface. Fixing the
serialization format is likely not possible at this point, but we can
replace Serializable with a better alternative and I'd like to start a
discussion on that.

The problem is essentially that Serializable::serialize() is expected to
return a string, which is generally obtained by recursively calling
serialize() in the Serializable::serialize() implementation. This
serialize() call shares state information with the outer serialize(), to
ensure that two references to the same object (or the same reference) will
continue referring to a single object/reference after serialization.

This causes two big issues:

First, the implementation is highly order-dependent. If
Serializable::serialize() contains multiple calls to serialize(), then
calls to unserialize() have to be repeated **in the same order** in
Serializable::unserialize(), otherwise unserialization may fail or be
corrupted. In particular this means that using parent::serialize() and
parent::unserialize() is unsafe. (See also
https://bugs.php.net/bug.php?id=66052 and linked bugs.)

Second, the existence of Serializable introduces security issues that we
cannot fix. Allowing the execution of PHP code during unserialization is
unsafe, and even innocuous looking code is easily exploited. We have
recently mitigated __wakeup() based attacks by delaying __wakeup() calls
until the end of the unserialization. We cannot do the same for
Serializable::unserialize() calls, as their design strictly requires the
unserialization context to still be active during the call. Similarly,
Serializable prevents an up-front validation pass of the serialized string,
as the format used for Serializable objects is user-defined.

The delayed __wakeup() mitigation mentioned in the previous point also
interacts badly with Serializable, because we have to delay __wakeup()
calls to the end of the unserialization, which in particular also implies
that Serializable::unserialize() sees objects prior to wakeup. (See also
https://bugs.php.net/bug.php?id=74436.)

In the end, everything comes down to the fact that Serializable requires
nested serialization calls with context sharing.

The alternative mechanism (__sleep + __wakeup) does not have these issues
(anymore), but it is not sufficiently flexible for general use: Notably,
__sleep() allows you to limit which properties are serialized, but the
properties still have to actually exist on the object.

I'd like to propose the addition of a new mechanism which essentially works
the same way as Serializable, but uses arrays instead of strings and does
not share context. I'm not sure about the naming (RealSerializable,
anyone?), so I'll just go with magic methods __serialize() and
__unserialize() for now:

public function __serialize() : array;
public function __unserialize(array $data) : void;

From a userland perspective the implementation should be the same as for
Serializable methods, but with interior serialize()/unserialize() calls
stripped out. Right now Serializable implementations already usually work
by doing something like "return serialize([ ... ])", this would change it
to just "return [ ... ]" and move the serialize()/unserialize() call into
the engine, where we can perform it safely and robustly.

The new methods should reuse the "O" serialization format, rather than
introducing a new one. This allows a measure of interoperability with
previous PHP versions, which can still decode serialized strings from newer
versions using __wakeup().

If an object has both __wakeup() and __unserialize(), then __unserialize()
should be called. If an object implements both Serializable::unserialize()
and __unserialize(), then we should invoke one or the other based on
whether "C" or "O" serialization is used.

Thoughts?

Nikita
Michał Brzuchalski
Re: [PHP-DEV] A replacement for the Serializable interface
April 21, 2017 02:50PM
I know my voice is doesn't mean anything but IMHO interface with magic
methods could bring more inconsistency.

I know PHP is consistently inconsistent but I would prefer if it is
posdible to fix an issue with present method naming.

Cheers

21.04.2017 11:39 "Nikita Popov" <[email protected]> napisał(a):

> Hi internals,
>
> As you are surely aware, serialization in PHP is a big mess. Said mess is
> caused by some fundamental issues in the serialization format, and
> exacerbated by the existence of the Serializable interface. Fixing the
> serialization format is likely not possible at this point, but we can
> replace Serializable with a better alternative and I'd like to start a
> discussion on that.
>
> The problem is essentially that Serializable::serialize() is expected to
> return a string, which is generally obtained by recursively calling
> serialize() in the Serializable::serialize() implementation. This
> serialize() call shares state information with the outer serialize(), to
> ensure that two references to the same object (or the same reference) will
> continue referring to a single object/reference after serialization.
>
> This causes two big issues:
>
> First, the implementation is highly order-dependent. If
> Serializable::serialize() contains multiple calls to serialize(), then
> calls to unserialize() have to be repeated **in the same order** in
> Serializable::unserialize(), otherwise unserialization may fail or be
> corrupted. In particular this means that using parent::serialize() and
> parent::unserialize() is unsafe. (See also
> https://bugs.php.net/bug.php?id=66052 and linked bugs.)
>
> Second, the existence of Serializable introduces security issues that we
> cannot fix. Allowing the execution of PHP code during unserialization is
> unsafe, and even innocuous looking code is easily exploited. We have
> recently mitigated __wakeup() based attacks by delaying __wakeup() calls
> until the end of the unserialization. We cannot do the same for
> Serializable::unserialize() calls, as their design strictly requires the
> unserialization context to still be active during the call. Similarly,
> Serializable prevents an up-front validation pass of the serialized string,
> as the format used for Serializable objects is user-defined.
>
> The delayed __wakeup() mitigation mentioned in the previous point also
> interacts badly with Serializable, because we have to delay __wakeup()
> calls to the end of the unserialization, which in particular also implies
> that Serializable::unserialize() sees objects prior to wakeup. (See also
> https://bugs.php.net/bug.php?id=74436.)
>
> In the end, everything comes down to the fact that Serializable requires
> nested serialization calls with context sharing.
>
> The alternative mechanism (__sleep + __wakeup) does not have these issues
> (anymore), but it is not sufficiently flexible for general use: Notably,
> __sleep() allows you to limit which properties are serialized, but the
> properties still have to actually exist on the object.
>
> I'd like to propose the addition of a new mechanism which essentially works
> the same way as Serializable, but uses arrays instead of strings and does
> not share context. I'm not sure about the naming (RealSerializable,
> anyone?), so I'll just go with magic methods __serialize() and
> __unserialize() for now:
>
> public function __serialize() : array;
> public function __unserialize(array $data) : void;
>
> From a userland perspective the implementation should be the same as for
> Serializable methods, but with interior serialize()/unserialize() calls
> stripped out. Right now Serializable implementations already usually work
> by doing something like "return serialize([ ... ])", this would change it
> to just "return [ ... ]" and move the serialize()/unserialize() call into
> the engine, where we can perform it safely and robustly.
>
> The new methods should reuse the "O" serialization format, rather than
> introducing a new one. This allows a measure of interoperability with
> previous PHP versions, which can still decode serialized strings from newer
> versions using __wakeup().
>
> If an object has both __wakeup() and __unserialize(), then __unserialize()
> should be called. If an object implements both Serializable::unserialize()
> and __unserialize(), then we should invoke one or the other based on
> whether "C" or "O" serialization is used.
>
> Thoughts?
>
> Nikita
>
On Fri, Apr 21, 2017 at 2:47 PM, Michał Brzuchalski <
[email protected]> wrote:

> I know my voice is doesn't mean anything but IMHO interface with magic
> methods could bring more inconsistency.
>
> I know PHP is consistently inconsistent but I would prefer if it is
> posdible to fix an issue with present method naming.
>
> Cheers
>
Magic methods have a distinct backwards compatibility advantage. They allow
you to add __serialize/__unserialize to an existing class that currently
uses Serializable. Older PHP versions will then use the Serializable
implementation, while never versions will use __serialize/__unserialize. An
interface makes this a lot more complicated, because you either have to
bump your PHP version requirement (unlikely), or you have to provide a shim
interface for older PHP versions. This shim interface would then be part of
any library currently implementing Serializable, which seems sub-optimal to
me. That's why I think magic methods are better for this case (though I
don't strongly care).

Also, to answer an OTR question: We cannot simply reuse the Serializable
interface by allowing an array return value from
Serializable::unserialize(). The array return value is only a means to an
end: the important part is that the new serialization mechanism does not
share serialization state -- using arrays instead of strings is just a
convenient way to achieve this. However, Serializable::unserialize()
currently shares the state and we cannot change this without breaking BC --
so we cannot reuse this interface.

Nikita



> 21.04.2017 11:39 "Nikita Popov" <[email protected]> napisał(a):
>
>> Hi internals,
>>
>> As you are surely aware, serialization in PHP is a big mess. Said mess is
>> caused by some fundamental issues in the serialization format, and
>> exacerbated by the existence of the Serializable interface. Fixing the
>> serialization format is likely not possible at this point, but we can
>> replace Serializable with a better alternative and I'd like to start a
>> discussion on that.
>>
>> The problem is essentially that Serializable::serialize() is expected to
>> return a string, which is generally obtained by recursively calling
>> serialize() in the Serializable::serialize() implementation. This
>> serialize() call shares state information with the outer serialize(), to
>> ensure that two references to the same object (or the same reference) will
>> continue referring to a single object/reference after serialization.
>>
>> This causes two big issues:
>>
>> First, the implementation is highly order-dependent. If
>> Serializable::serialize() contains multiple calls to serialize(), then
>> calls to unserialize() have to be repeated **in the same order** in
>> Serializable::unserialize(), otherwise unserialization may fail or be
>> corrupted. In particular this means that using parent::serialize() and
>> parent::unserialize() is unsafe. (See also
>> https://bugs.php.net/bug.php?id=66052 and linked bugs.)
>>
>> Second, the existence of Serializable introduces security issues that we
>> cannot fix. Allowing the execution of PHP code during unserialization is
>> unsafe, and even innocuous looking code is easily exploited. We have
>> recently mitigated __wakeup() based attacks by delaying __wakeup() calls
>> until the end of the unserialization. We cannot do the same for
>> Serializable::unserialize() calls, as their design strictly requires the
>> unserialization context to still be active during the call. Similarly,
>> Serializable prevents an up-front validation pass of the serialized
>> string,
>> as the format used for Serializable objects is user-defined.
>>
>> The delayed __wakeup() mitigation mentioned in the previous point also
>> interacts badly with Serializable, because we have to delay __wakeup()
>> calls to the end of the unserialization, which in particular also implies
>> that Serializable::unserialize() sees objects prior to wakeup. (See also
>> https://bugs.php.net/bug.php?id=74436.)
>>
>> In the end, everything comes down to the fact that Serializable requires
>> nested serialization calls with context sharing.
>>
>> The alternative mechanism (__sleep + __wakeup) does not have these issues
>> (anymore), but it is not sufficiently flexible for general use: Notably,
>> __sleep() allows you to limit which properties are serialized, but the
>> properties still have to actually exist on the object.
>>
>> I'd like to propose the addition of a new mechanism which essentially
>> works
>> the same way as Serializable, but uses arrays instead of strings and does
>> not share context. I'm not sure about the naming (RealSerializable,
>> anyone?), so I'll just go with magic methods __serialize() and
>> __unserialize() for now:
>>
>> public function __serialize() : array;
>> public function __unserialize(array $data) : void;
>>
>> From a userland perspective the implementation should be the same as for
>> Serializable methods, but with interior serialize()/unserialize() calls
>> stripped out. Right now Serializable implementations already usually work
>> by doing something like "return serialize([ ... ])", this would change it
>> to just "return [ ... ]" and move the serialize()/unserialize() call into
>> the engine, where we can perform it safely and robustly.
>>
>> The new methods should reuse the "O" serialization format, rather than
>> introducing a new one. This allows a measure of interoperability with
>> previous PHP versions, which can still decode serialized strings from
>> newer
>> versions using __wakeup().
>>
>> If an object has both __wakeup() and __unserialize(), then __unserialize()
>> should be called. If an object implements both Serializable::unserialize()
>> and __unserialize(), then we should invoke one or the other based on
>> whether "C" or "O" serialization is used.
>>
>> Thoughts?
>>
>> Nikita
>>
>
Michał Brzuchalski
Re: [PHP-DEV] A replacement for the Serializable interface
April 21, 2017 07:50PM
Thanks Nikita,

Thank you for explanation no I get it.
I only don't know what means serialization "O" and "C" but don't bother.
I'll try to google it.

Cheers
--
Michał

21.04.2017 15:51 "Nikita Popov" <[email protected]> napisał(a):

On Fri, Apr 21, 2017 at 2:47 PM, Michał Brzuchalski <
[email protected]> wrote:

> I know my voice is doesn't mean anything but IMHO interface with magic
> methods could bring more inconsistency.
>
> I know PHP is consistently inconsistent but I would prefer if it is
> posdible to fix an issue with present method naming.
>
> Cheers
>
Magic methods have a distinct backwards compatibility advantage. They allow
you to add __serialize/__unserialize to an existing class that currently
uses Serializable. Older PHP versions will then use the Serializable
implementation, while never versions will use __serialize/__unserialize. An
interface makes this a lot more complicated, because you either have to
bump your PHP version requirement (unlikely), or you have to provide a shim
interface for older PHP versions. This shim interface would then be part of
any library currently implementing Serializable, which seems sub-optimal to
me. That's why I think magic methods are better for this case (though I
don't strongly care).

Also, to answer an OTR question: We cannot simply reuse the Serializable
interface by allowing an array return value from
Serializable::unserialize(). The array return value is only a means to an
end: the important part is that the new serialization mechanism does not
share serialization state -- using arrays instead of strings is just a
convenient way to achieve this. However, Serializable::unserialize()
currently shares the state and we cannot change this without breaking BC --
so we cannot reuse this interface.

Nikita



> 21.04.2017 11:39 "Nikita Popov" <[email protected]> napisał(a):
>
>> Hi internals,
>>
>> As you are surely aware, serialization in PHP is a big mess. Said mess is
>> caused by some fundamental issues in the serialization format, and
>> exacerbated by the existence of the Serializable interface. Fixing the
>> serialization format is likely not possible at this point, but we can
>> replace Serializable with a better alternative and I'd like to start a
>> discussion on that.
>>
>> The problem is essentially that Serializable::serialize() is expected to
>> return a string, which is generally obtained by recursively calling
>> serialize() in the Serializable::serialize() implementation. This
>> serialize() call shares state information with the outer serialize(), to
>> ensure that two references to the same object (or the same reference) will
>> continue referring to a single object/reference after serialization.
>>
>> This causes two big issues:
>>
>> First, the implementation is highly order-dependent. If
>> Serializable::serialize() contains multiple calls to serialize(), then
>> calls to unserialize() have to be repeated **in the same order** in
>> Serializable::unserialize(), otherwise unserialization may fail or be
>> corrupted. In particular this means that using parent::serialize() and
>> parent::unserialize() is unsafe. (See also
>> https://bugs.php.net/bug.php?id=66052 and linked bugs.)
>>
>> Second, the existence of Serializable introduces security issues that we
>> cannot fix. Allowing the execution of PHP code during unserialization is
>> unsafe, and even innocuous looking code is easily exploited. We have
>> recently mitigated __wakeup() based attacks by delaying __wakeup() calls
>> until the end of the unserialization. We cannot do the same for
>> Serializable::unserialize() calls, as their design strictly requires the
>> unserialization context to still be active during the call. Similarly,
>> Serializable prevents an up-front validation pass of the serialized
>> string,
>> as the format used for Serializable objects is user-defined.
>>
>> The delayed __wakeup() mitigation mentioned in the previous point also
>> interacts badly with Serializable, because we have to delay __wakeup()
>> calls to the end of the unserialization, which in particular also implies
>> that Serializable::unserialize() sees objects prior to wakeup. (See also
>> https://bugs.php.net/bug.php?id=74436.)
>>
>> In the end, everything comes down to the fact that Serializable requires
>> nested serialization calls with context sharing.
>>
>> The alternative mechanism (__sleep + __wakeup) does not have these issues
>> (anymore), but it is not sufficiently flexible for general use: Notably,
>> __sleep() allows you to limit which properties are serialized, but the
>> properties still have to actually exist on the object.
>>
>> I'd like to propose the addition of a new mechanism which essentially
>> works
>> the same way as Serializable, but uses arrays instead of strings and does
>> not share context. I'm not sure about the naming (RealSerializable,
>> anyone?), so I'll just go with magic methods __serialize() and
>> __unserialize() for now:
>>
>> public function __serialize() : array;
>> public function __unserialize(array $data) : void;
>>
>> From a userland perspective the implementation should be the same as for
>> Serializable methods, but with interior serialize()/unserialize() calls
>> stripped out. Right now Serializable implementations already usually work
>> by doing something like "return serialize([ ... ])", this would change it
>> to just "return [ ... ]" and move the serialize()/unserialize() call into
>> the engine, where we can perform it safely and robustly.
>>
>> The new methods should reuse the "O" serialization format, rather than
>> introducing a new one. This allows a measure of interoperability with
>> previous PHP versions, which can still decode serialized strings from
>> newer
>> versions using __wakeup().
>>
>> If an object has both __wakeup() and __unserialize(), then __unserialize()
>> should be called. If an object implements both Serializable::unserialize()
>> and __unserialize(), then we should invoke one or the other based on
>> whether "C" or "O" serialization is used.
>>
>> Thoughts?
>>
>> Nikita
>>
>
Sorry, only registered users may post in this forum.

Click here to login