Welcome! Log In Create A New Profile

Advanced

[PHP-DEV] unpack() offset and consumed data measurement

Posted by Chris Wright 
Chris Wright
[PHP-DEV] unpack() offset and consumed data measurement
January 28, 2018 01:20PM
Morning all

Since PHP 7.1 the unpack() function has a (still undocumented) optional 3rd
argument that allows the caller to specify the offset in the input data
where parsing should start. While this is a useful feature, it is currently
impossible to know how many bytes of the input were consumed for some
format specifiers, such as Z*, f, d and anything else that does not consume
a universally constant amount of data.

It is typically possible to determine this externally, but not without some
clumsy measurements either of the returned value or (in the case of
system-dependent numeric types) inspecting the length of the string
returned by pack() for those specifiers. It can also get complicated when
using things like x and X, which adjust the offset without producing data
in the returned value.

Additionally, computing the new position in the input buffer separately
from the format string risks the two diverging if one is modified and the
other is either not updated, or updated incorrectly.

Many binary data formats are sufficiently complex that unpacking a large
structure requires multiple calls to unpack(), as often there are nuances
that cannot be directly expressed with the current specifier format, such
as strings prefixed with a length indicator.

Here is some code that demonstrates the problem:

/* This is the only way to know for certain how big float is on the
local system */
define('FLOAT_WIDTH', strlen(pack('f', 0.0)));

/* an exaggerated example using two variable width codes and a code that
does not produce output but modifies the input buffer offset */
$pieces = unpack('f/X/Z*', $data, $offset);

/* we now have to modify the offset before we can continue to unpack
data */
$offset += FLOAT_WIDTH // f
- 1 // x
+ strlen($pieces[3]); // Z*

I would like to look at adding a 4th optional argument, taken by-ref, which
will be populated with the number of buffer bytes consumed by the unpack()
operation. This would enable the above code to be rewritten like so:

$pieces = unpack('f/X/Z*', $data, $offset, $consumed);
$offset += $consumed;

Not only is this code much simpler and less susceptible to breakage, it is
(IMHO) clearer to read as well.

Does anyone have any objections to/thoughts about this? If not I will work
up a patch in the coming week.

Thanks, Chris
Christoph M. Becker
[PHP-DEV] Re: unpack() offset and consumed data measurement
January 28, 2018 01:50PM
On 28.01.2018 at 13:12, Chris Wright wrote:

> Since PHP 7.1 the unpack() function has a (still undocumented) optional 3rd
> argument […]

JFTR: documented with
http://svn.php.net/viewvc?view=revision&revision=344003.

--
Christoph M. Becker

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Chris Wright
[PHP-DEV] Re: unpack() offset and consumed data measurement
January 28, 2018 02:00PM
On 28 January 2018 at 12:42, Christoph M. Becker <[email protected]> wrote:

> On 28.01.2018 at 13:12, Chris Wright wrote:
>
> > Since PHP 7.1 the unpack() function has a (still undocumented) optional
> 3rd
> > argument […]
>
> JFTR: documented with
> http://svn.php.net/viewvc?view=revision&revision=344003.
>
>
Thanks!
Chris Wright
[PHP-DEV] Re: unpack() offset and consumed data measurement
January 30, 2018 01:00PM
On 28 January 2018 at 12:12, Chris Wright <[email protected]> wrote:
>
> Here is some code that demonstrates the problem:
>
> /* This is the only way to know for certain how big float is on the
> local system */
> define('FLOAT_WIDTH', strlen(pack('f', 0.0)));
>
> /* an exaggerated example using two variable width codes and a code
> that
> does not produce output but modifies the input buffer offset */
> $pieces = unpack('f/X/Z*', $data, $offset);
>
> /* we now have to modify the offset before we can continue to unpack
> data */
> $offset += FLOAT_WIDTH // f
> - 1 // x
> + strlen($pieces[3]); // Z*
>

Re-reading this mail I have noticed there was a small mistake in the code
sample, in that I forgot to include the terminating null byte for the Z*
data.

This (unintentionally) demonstrates the exact reason I would like to add
this, as it's very easy to accidentally write subtle bugs.
Sorry, only registered users may post in this forum.

Click here to login