Welcome! Log In Create A New Profile

Advanced

[PHP-DEV] PHP 7.1.0 to 7.2.0beta2 mt_rand() modulo bias bug

Posted by Solar Designer 
Solar Designer
[PHP-DEV] PHP 7.1.0 to 7.2.0beta2 mt_rand() modulo bias bug
August 16, 2017 09:20PM
Hi,

It is well-known that it is impossible to map e.g. a 32-bit random
number with a uniform distribution over its full range of values onto a
range with fewer different values while maintaining a uniform
distribution, except when the target range contains a whole power of 2
number of different values.

Prior to 7.1.0, mt_rand()'s mapping onto the target range was done with
some floating-point math, so there was no modulo division exposed in
code, yet the problem above is fundamental and indeed it applied, e.g.:

https://3v4l.org/ihCT8

<?php
// PHP pre-7.1.0 modulo bias demo
mt_srand(1234567890);
$total = 100000;
$max = 0x66666666;
$halves[0] = $halves[1] = 0;
for ($i = 0; $i < $total; $i++) {
$halves[(mt_rand(0, $max - 1) >> 1) & 1]++;
}
printf("%.1f%% vs. %.1f%%\n", 100. * $halves[0] / $total, 100. * $halves[1] / $total);
?>

Output for 7.1.0 - 7.2.0beta2
49.9% vs. 50.1%
Output for 5.2.1 - 5.6.30, hhvm-3.10.1 - 3.21.0, 7.0.0 - 7.0.20
40.0% vs. 60.0%
Output for 4.3.0 - 5.2.0
39.8% vs. 60.2%

PHP 7.1.0 moved to explicit integer modulo division yet tried to avoid
the modulo bias by generating multiple original 32-bit or 64-bit random
numbers in cases where the modulo bias would occur. Unfortunately,
because of an implementation bug it failed, e.g.:

https://3v4l.org/kBpqh

<?php
// PHP 7.1.0 to 7.2.0beta2 modulo bias bug found during work on http://www.openwall.com/php_mt_seed/
mt_srand(1234567890);
$total = 100000;
$max = 0x66666666;
$halves[0] = $halves[1] = 0;
for ($i = 0; $i < $total; $i++) {
$halves[mt_rand(0, $max - 1) / ($max / 2)]++;
}
printf("%.1f%% vs. %.1f%%\n", 100. * $halves[0] / $total, 100. * $halves[1] / $total);
?>

Output for 7.1.0 - 7.2.0beta2
60.0% vs. 40.0%
Output for 4.3.0 - 5.6.30, hhvm-3.10.1 - 3.21.0, 7.0.0 - 7.0.20
50.0% vs. 50.0%

(Even higher bias can be seen in both cases by changing the 0x66666666
to 0xaaaaaaaa.)

I first found the bug through code review while working on adding PHP
7.1.0+ support to php_mt_seed. The above 3v4l was only to confirm it.

As I said in the Twitter thread (certainly not a proper place):

https://twitter.com/solardiz/status/897617315008839686

"The bug is unconditional use of 64-bit ZEND_ULONG_MAX in the bias
avoidance, even when the intermediate result is 32-bit (up to UINT32_MAX)."

Leigh confirmed this understanding, tweeting:

"Yep, happens when (max - min) is less than UINT32_MAX. A check on
(umax > UINT32_MAX) and setting the limit fixes it."

This sounds like almost the correct fix to me. The code is:

PHPAPI zend_long php_mt_rand_range(zend_long min, zend_long max)
{
zend_ulong umax = max - min;
zend_ulong limit;
zend_ulong result;

result = php_mt_rand();
#if ZEND_ULONG_MAX > UINT32_MAX
if (umax > UINT32_MAX) {
result = (result << 32) | php_mt_rand();
}
#endif

/* Special case where no modulus is required */
if (UNEXPECTED(umax == ZEND_ULONG_MAX)) {
return (zend_long)result;
}

/* Increment the max so the range is inclusive of max */
umax++;

/* Powers of two are not biased */
if (EXPECTED((umax & (umax - 1)) != 0)) {
/* Ceiling under which ZEND_LONG_MAX % max == 0 */
limit = ZEND_ULONG_MAX - (ZEND_ULONG_MAX % umax) - 1;

/* Discard numbers over the limit to avoid modulo bias */
while (UNEXPECTED(result > limit)) {
#if ZEND_ULONG_MAX > UINT32_MAX
if (umax > UINT32_MAX) {
result = (result << 32) | php_mt_rand();
}
else {
result = php_mt_rand();
}
#else
result = php_mt_rand();
#endif
}
}

return (zend_long)((result % umax) + min);
}

and what Leigh means is something like:

/* Ceiling under which *_MAX % max == 0 */
if (umax > UINT32_MAX)
limit = ZEND_ULONG_MAX - (ZEND_ULONG_MAX % umax) - 1;
else
limit = UINT32_MAX - (UINT32_MAX % umax) - 1;

Strictly speaking, the check should be the same as the one earlier in
the function, which decided how "result" was obtained with one or two
calls to php_mt_rand(). The above proposed check isn't exactly the same
because of the "umax++" inbetween. The special case of "umax" being
exactly UINT32_MAX prior to the increment needs more thought. Probably
the check in the first instance of:

if (umax > UINT32_MAX) {
result = (result << 32) | php_mt_rand();
}

is buggy and should actually be:

if (umax >= UINT32_MAX) {
result = (result << 32) | php_mt_rand();
}

whereas further code should continue with "umax > UINT32_MAX" because of
the increment.

Also, why even bother to support ranges beyond 32-bit? Sounds like a
misfeature to me, considering it won't(?) be universally available on
all PHP builds anyway (not on 32-bit ones, right?) and thus shouldn't(?)
be relied upon by applications (although it might become reasonable for
application developers not to care about 32-bit soon). I also see few
use cases for it, even if it were universally available.

The main bug reported in this message (wrongly using ZEND_ULONG_MAX for
computing "limit" when "result" is 32-bit) also means that PHP 7.1.0 to
7.2.0beta2 probably produce partially different sequences of mt_rand()
outputs between 32- and 64-bit builds for the same seed, for most ranges
that fit within 32 bits. Specifically, 32-bit builds probably correctly
avoid the modulo bias. (However, I didn't confirm this with a test.
Unfortunately, 3v4l.org appears to be 64-bit only.) Possibility of bugs
like this is yet another reason not to have introduced the little-needed
64-bit build specific functionality. OTOH, this was also a (missed)
opportunity to detect this bug with more testing (requiring same
behavior of 32- and 64-bit builds for 32-bit inputs).

Alexander

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On Wed, 16 Aug 2017 at 20:13 Solar Designer <[email protected]> wrote:

> Also, why even bother to support ranges beyond 32-bit? Sounds like a
> misfeature to me, considering it won't(?) be universally available on
> all PHP builds anyway (not on 32-bit ones, right?) and thus shouldn't(?)
> be relied upon by applications (although it might become reasonable for
> application developers not to care about 32-bit soon). I also see few
> use cases for it, even if it were universally available.
>

It was possible (on 64 bit builds) to specify min and max such that the
size of the output required from mt_rand was the full 64 bit range.

Prior to 7.1 this full output was created by stretching a single 32 bit
output up to the required range using floating point arithmetic, which
caused other biases in the output.

Unfortunately when fixing this bias, a new bias was introduced. I took
known working code from the CSPRNG and didn't account for the variable
length of the sample.

My proposed fix would be to add a "limit_max" variable, initialise it to
UINT32_MAX, and in the first range check where we decide to add an extra
output or not, set it to ZEND_ULONG_MAX. Then the statement creating the
ceiling value can use limit_max instead of the constant value.
On Wed, Aug 16, 2017 at 9:47 PM, Leigh <[email protected]> wrote:

> On Wed, 16 Aug 2017 at 20:13 Solar Designer <[email protected]> wrote:
>
> > Also, why even bother to support ranges beyond 32-bit? Sounds like a
> > misfeature to me, considering it won't(?) be universally available on
> > all PHP builds anyway (not on 32-bit ones, right?) and thus shouldn't(?)
> > be relied upon by applications (although it might become reasonable for
> > application developers not to care about 32-bit soon). I also see few
> > use cases for it, even if it were universally available.
> >
>
> It was possible (on 64 bit builds) to specify min and max such that the
> size of the output required from mt_rand was the full 64 bit range.
>
> Prior to 7.1 this full output was created by stretching a single 32 bit
> output up to the required range using floating point arithmetic, which
> caused other biases in the output.
>
> Unfortunately when fixing this bias, a new bias was introduced. I took
> known working code from the CSPRNG and didn't account for the variable
> length of the sample.
>
> My proposed fix would be to add a "limit_max" variable, initialise it to
> UINT32_MAX, and in the first range check where we decide to add an extra
> output or not, set it to ZEND_ULONG_MAX. Then the statement creating the
> ceiling value can use limit_max instead of the constant value.
>

I'd suggest to split the 32-bit and 64-bit code codepaths entirely, as the
interleaved #ifs are somewhat hard to follow. Something like
https://gist.github.com/nikic/64e7ec58ebb6121d350fb80927a65082 (not
thoroughly tested).

Nikita
On Wed, Aug 16, 2017 at 10:06:02PM +0200, Nikita Popov wrote:
> I'd suggest to split the 32-bit and 64-bit code codepaths entirely, as the
> interleaved #ifs are somewhat hard to follow. Something like
> https://gist.github.com/nikic/64e7ec58ebb6121d350fb80927a65082 (not
> thoroughly tested).

This looks good to me - especially how you reduced the nesting of if's
by special-casing the "Powers of two are not biased" return. With this
change, you can as well drop the "Special case where no modulus is
required", as it'd happen to be handled the same by your new return.
OTOH, that optimization might require an extra comment on its own.

Here's what this might look like (totally untested):

https://gist.github.com/solardiz/5e3d313bbee2c1ce6e200e433b750bef

Alexander

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On Wed, Aug 16, 2017 at 11:41:55PM +0200, Solar Designer wrote:
> On Wed, Aug 16, 2017 at 10:06:02PM +0200, Nikita Popov wrote:
> > I'd suggest to split the 32-bit and 64-bit code codepaths entirely, as the
> > interleaved #ifs are somewhat hard to follow. Something like
> > https://gist.github.com/nikic/64e7ec58ebb6121d350fb80927a65082 (not
> > thoroughly tested).
>
> This looks good to me - especially how you reduced the nesting of if's
> by special-casing the "Powers of two are not biased" return. With this
> change, you can as well drop the "Special case where no modulus is
> required", as it'd happen to be handled the same by your new return.
> OTOH, that optimization might require an extra comment on its own.
>
> Here's what this might look like (totally untested):
>
> https://gist.github.com/solardiz/5e3d313bbee2c1ce6e200e433b750bef

One difference I didn't notice at first: the currently committed code
does only one php_mt_rand() call per loop iteration when it's skipping
"numbers over the limit", even in the 64-bit case (thus, it replaces 32
bits of the value at a time). Your 64-bit version (and my revision of
it) does two calls per iteration (replacing the whole number).

Arguably, the currently committed code is smarter and more efficient in
that respect, whereas yours is cleaner. Regardless, it's an extra
change that will affect the generated random numbers in some cases where
we could avoid making that change (it's not fixing any bug).

I think it's preferable not to unnecessarily change what numbers are
generated.

Alexander

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On Thu, Aug 17, 2017 at 12:02 AM, Solar Designer <[email protected]> wrote:

> On Wed, Aug 16, 2017 at 11:41:55PM +0200, Solar Designer wrote:
> > On Wed, Aug 16, 2017 at 10:06:02PM +0200, Nikita Popov wrote:
> > > I'd suggest to split the 32-bit and 64-bit code codepaths entirely, as
> the
> > > interleaved #ifs are somewhat hard to follow. Something like
> > > https://gist.github.com/nikic/64e7ec58ebb6121d350fb80927a65082 (not
> > > thoroughly tested).
> >
> > This looks good to me - especially how you reduced the nesting of if's
> > by special-casing the "Powers of two are not biased" return. With this
> > change, you can as well drop the "Special case where no modulus is
> > required", as it'd happen to be handled the same by your new return.
> > OTOH, that optimization might require an extra comment on its own.
> >
> > Here's what this might look like (totally untested):
> >
> > https://gist.github.com/solardiz/5e3d313bbee2c1ce6e200e433b750bef
>
> One difference I didn't notice at first: the currently committed code
> does only one php_mt_rand() call per loop iteration when it's skipping
> "numbers over the limit", even in the 64-bit case (thus, it replaces 32
> bits of the value at a time). Your 64-bit version (and my revision of
> it) does two calls per iteration (replacing the whole number).
>
> Arguably, the currently committed code is smarter and more efficient in
> that respect, whereas yours is cleaner. Regardless, it's an extra
> change that will affect the generated random numbers in some cases where
> we could avoid making that change (it's not fixing any bug).
>
> I think it's preferable not to unnecessarily change what numbers are
> generated.
>
> Alexander
>

Good point. However, I think that this optimization is not actually
correct. For example, let's take umax = 0x1_0000_0001, which is the
smallest value for which this codepath is taken. In this case limit would
be 0xffff_ffff_ffff_fffe, so we only resample if the value is exactly
0xffff_ffff_ffff_ffff. So if the resampling codepath is taken in this case,
and we only generate one new 32-bit value, the top word of the result will
be fixed to 0xffff_ffff. (A very small bias in this case, but there's
probably more significant cases.)

Nikita
On Thu, Aug 17, 2017 at 12:57:56AM +0200, Nikita Popov wrote:
> On Thu, Aug 17, 2017 at 12:02 AM, Solar Designer <[email protected]> wrote:
> > One difference I didn't notice at first: the currently committed code
> > does only one php_mt_rand() call per loop iteration when it's skipping
> > "numbers over the limit", even in the 64-bit case (thus, it replaces 32
> > bits of the value at a time). Your 64-bit version (and my revision of
> > it) does two calls per iteration (replacing the whole number).
> >
> > Arguably, the currently committed code is smarter and more efficient in
> > that respect, whereas yours is cleaner. Regardless, it's an extra
> > change that will affect the generated random numbers in some cases where
> > we could avoid making that change (it's not fixing any bug).
> >
> > I think it's preferable not to unnecessarily change what numbers are
> > generated.

> Good point. However, I think that this optimization is not actually
> correct. For example, let's take umax = 0x1_0000_0001, which is the
> smallest value for which this codepath is taken. In this case limit would
> be 0xffff_ffff_ffff_fffe, so we only resample if the value is exactly
> 0xffff_ffff_ffff_ffff. So if the resampling codepath is taken in this case,
> and we only generate one new 32-bit value, the top word of the result will
> be fixed to 0xffff_ffff. (A very small bias in this case, but there's
> probably more significant cases.)

Great point. More generally, we can't reuse the same random number for
decision-making (to skip it) and as part of the next (not so) random
number, without introducing bias. So we shouldn't, and we should in
fact move away from the old "smarter" behavior as a bug fix. Thank you!

Alexander

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On Thu, Aug 17, 2017 at 03:18:30PM +0200, Solar Designer wrote:
> On Thu, Aug 17, 2017 at 12:57:56AM +0200, Nikita Popov wrote:
> > On Thu, Aug 17, 2017 at 12:02 AM, Solar Designer <[email protected]> wrote:
> > > One difference I didn't notice at first: the currently committed code
> > > does only one php_mt_rand() call per loop iteration when it's skipping
> > > "numbers over the limit", even in the 64-bit case (thus, it replaces 32
> > > bits of the value at a time). Your 64-bit version (and my revision of
> > > it) does two calls per iteration (replacing the whole number).
> > >
> > > Arguably, the currently committed code is smarter and more efficient in
> > > that respect, whereas yours is cleaner. Regardless, it's an extra
> > > change that will affect the generated random numbers in some cases where
> > > we could avoid making that change (it's not fixing any bug).
> > >
> > > I think it's preferable not to unnecessarily change what numbers are
> > > generated.
>
> > Good point. However, I think that this optimization is not actually
> > correct. For example, let's take umax = 0x1_0000_0001, which is the
> > smallest value for which this codepath is taken. In this case limit would
> > be 0xffff_ffff_ffff_fffe, so we only resample if the value is exactly
> > 0xffff_ffff_ffff_ffff. So if the resampling codepath is taken in this case,
> > and we only generate one new 32-bit value, the top word of the result will
> > be fixed to 0xffff_ffff. (A very small bias in this case, but there's
> > probably more significant cases.)
>
> Great point. More generally, we can't reuse the same random number for
> decision-making (to skip it) and as part of the next (not so) random
> number, without introducing bias. So we shouldn't, and we should in
> fact move away from the old "smarter" behavior as a bug fix. Thank you!

So I guess both the bug I reported and this one Nikita found are going
to get fixed soon? For 7.2.0 maybe?

Meanwhile, I released php_mt_seed 4.0 yesterday with support for latest
PHP's mt_rand(), as well as with support for PHP 5.2.0 and below (as it
happens, all the way to 3.0.7, although that's overkill). Near the end
of its documentation, I included a lengthy section entitled "PHP version
curiosities", which I include below in this e-mail in case any of this
is useful for PHP's own documentation. It starts from PHP 3.0.6, but
then actually spends half of the text on PHP 7.1.0+. Here we go:

---
While php_mt_seed supports 3 major revisions of PHP's mt_rand()
algorithm and that sort of covers PHP 3.0.7 through 7.1.0+ (up to the
latest as of this writing and probably beyond), the reality is somewhat
trickier than that. From older versions to newer:

As a mere historical curiosity, php_mt_seed is in fact able to crack
seeds of PHP 3.0.6, which is the very first version that introduced
mt_rand(), but only as long as no range was passed into mt_rand(). That
version had broken support for ranges, and indeed there's no point in
supporting that short-lived breakage in php_mt_seed now. With this
detail, php_mt_seed has some support for all mt_rand() capable versions
of PHP released so far.

Then there's PHP 3.0.7 through 5.2.0, where Mersenne Twister's state
initialization is with multiples of 69069. This enables our stateless
implementation to quickly jump to the state array element needed to
compute the first mt_rand() output by using a precomputed value for
69069 raised to the power 396 (mod 2**32), which is MT's M-1. Another
curiosity of those versions, which we take advantage of too, is that
they treat adjacent even and odd seeds the same, so the effective seed
space is 31-bit.

PHP 3.0.6 to 4.1.2 used a default seed of 4357 (and thus also 4356) if
mt_srand() was not called. PHP 4.2.0 changed that to automatic seeding
using system time and PHP process ID (still predictable and now also
leaky, but no longer a constant), but there was "Bug #25007 rand &
mt_rand seed RNG every call" until 4.3.3, which presumably affected how
cracked seeds could (not) be used.

PHP 5.2.1 changed MT state initialization to MT authors' new recommended
algorithm, which is no longer linear so we have to compute the first 397
state elements (out of 624) even though in the simplest case we only
need (and only store) the first and last one of those (or we could use a
time-memory trade-off, which we currently don't).

PHP 5.2.1 also introduced a bug into its implementation of MT (use of a
wrong variable, whereas pre-5.2.1 code was correct in that respect).
This bug lets us skip a few operations for every other seed, which we
do, although this optimization is so minor that we could as well not
bother. PHP 7.1.0 fixed this bug (reverting to pre-5.2.1 code in that
respect, so we use the same logic for pre-5.2.1 and 7.1.0+ there).

In PHP versions from 3.0.7 to 7.0.x, if mt_rand() was called with its
optional output range specified, a 31-bit (0 to 2147483647) MT PRNG
output was scaled to that range using floating-point math. This meant
that if a range wider than 31-bit was requested on a 64-bit build of
PHP, some values would never occur. This also meant that even for most
ranges smaller than 31-bit a bias was introduced (some output values
became more likely than others), as compared to MT's raw output (which
was relatively unbiased).

PHP 7.1.0 tried to fix those biases by dropping the floating-point math
and instead mapping the raw 32-bit MT PRNG outputs to the target range
using integer modulo division. To avoid inherent bias when the target
range isn't a whole power of 2 of possible integer values, a loop was
introduced to skip raw 32-bit PRNG outputs (until a suitable one is
seen) that would result in such bias. A bug in that code was found and
reported due to work on php_mt_seed. As it turned out, the loop only
works right in 32-bit builds of PHP, and is ineffective on 64-bit
(except with 64-bit ranges, see below). Luckily, this actually makes
things simpler for php_mt_seed, and currently php_mt_seed fully supports
the behavior of 64-bit builds only (for ranges up to 0 to 2147483646).

There's currently no intent to add to php_mt_seed the complication of
bias-avoidance of 32-bit builds of PHP 7.1.0+, as well as of 64-bit
builds of future versions where the bug will presumably get fixed. What
this means in practice is that for 32-bit builds of PHP and future
versions of PHP, php_mt_seed may occasionally find wrong and miss
correct seeds for mt_rand() invoked with a range, but the probability of
this happening is very low except for very wide ranges that are not a
whole power of 2 of possible integer values. For example, mt_rand(0,
61) or mt_rand(111, 222) are very unlikely to trigger the problem,
mt_rand(0, 255) can't trigger the problem, whereas mt_rand(1000000000,
2000000000) is somewhat likely to trigger it. Such likely problematic
ranges are probably rarely used and are of little relevance to uses of
php_mt_seed. Also, supporting this buggy vs. correct behavior would
require treating 32- and 64-bit builds of PHP separately and reporting
on them differently.

PHP 7.1.0 also tried to introduce proper support for 64-bit ranges in
64-bit builds. It generates two raw 32-bit PRNG outputs to derive one
mt_rand() output when the target range spans more than a 32-bit space.
Unfortunately, the implementation is buggy in a way where it'd introduce
biases into such mt_rand() outputs. The bug will presumably get fixed
as well, but regardless there's currently no intent to support wider
than 31-bit ranges in php_mt_seed. This is obscure functionality
(arguably, originally an accidental misfeature, which the PHP developers
didn't really have to make official) that is only available on 64-bit
builds of PHP. Currently, php_mt_seed does not allow specifying larger
than 31-bit integers on its command line (it will report an error when a
larger value is specified).

Prior to PHP 7.1.0, mt_rand(0, 2147483647) was equivalent to mt_rand()
without a range, and php_mt_seed still assumes so. This assumption is
no longer valid for PHP 7.1.0+, which means that when searching for
seeds for PHP 7.1.0+ for mt_rand() called with a range specified, you
can specify at most a range one smaller than that, thus "0 2147483646"
being the maximum that php_mt_seed supports for those versions. This
minor limitation shouldn't matter in practice, except that you might
need to be aware you can continue to specify a range of "0 2147483647"
to indicate that no range was passed into mt_rand().

PHP 7.1.0 also aliased rand() to mt_rand() and srand() to mt_srand().
This means that on one hand you can use php_mt_seed to crack rand()
seeds for PHP 7.1.0+ (since those are also mt_rand() seeds), but on the
other hand this cross-seeding and cross-consumption of random numbers
can affect which attacks work or don't work, and exactly how, against
specific applications that make use of both sets of PHP functions.

PHP 7.1.0 also introduced MT_RAND_PHP as optional second parameter to
mt_srand(). When specified, it correctly enables behavior identical to
that of PHP versions 5.2.1 to 7.0.x. Thus, seeds that php_mt_seed
reports as valid for 5.2.1 to 7.0.x are always also valid for 7.1.0+
with MT_RAND_PHP, and conversely seeds that php_mt_seed reports as valid
for 7.1.0+ are often invalid for 7.1.0+ with MT_RAND_PHP (except when
the same seeds are also valid for 5.2.1 to 7.0.x, which is common).
---

Alexander

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Nikita Popov
Re: [PHP-DEV] PHP 7.1.0 to 7.2.0beta2 mt_rand() modulo bias bug
September 07, 2017 08:30PM
On Wed, Aug 23, 2017 at 10:05 PM, Solar Designer <[email protected]> wrote:

> On Thu, Aug 17, 2017 at 03:18:30PM +0200, Solar Designer wrote:
> > On Thu, Aug 17, 2017 at 12:57:56AM +0200, Nikita Popov wrote:
> > > On Thu, Aug 17, 2017 at 12:02 AM, Solar Designer <[email protected]>
> wrote:
> > > > One difference I didn't notice at first: the currently committed code
> > > > does only one php_mt_rand() call per loop iteration when it's
> skipping
> > > > "numbers over the limit", even in the 64-bit case (thus, it replaces
> 32
> > > > bits of the value at a time). Your 64-bit version (and my revision
> of
> > > > it) does two calls per iteration (replacing the whole number).
> > > >
> > > > Arguably, the currently committed code is smarter and more efficient
> in
> > > > that respect, whereas yours is cleaner. Regardless, it's an extra
> > > > change that will affect the generated random numbers in some cases
> where
> > > > we could avoid making that change (it's not fixing any bug).
> > > >
> > > > I think it's preferable not to unnecessarily change what numbers are
> > > > generated.
> >
> > > Good point. However, I think that this optimization is not actually
> > > correct. For example, let's take umax = 0x1_0000_0001, which is the
> > > smallest value for which this codepath is taken. In this case limit
> would
> > > be 0xffff_ffff_ffff_fffe, so we only resample if the value is exactly
> > > 0xffff_ffff_ffff_ffff. So if the resampling codepath is taken in this
> case,
> > > and we only generate one new 32-bit value, the top word of the result
> will
> > > be fixed to 0xffff_ffff. (A very small bias in this case, but there's
> > > probably more significant cases.)
> >
> > Great point. More generally, we can't reuse the same random number for
> > decision-making (to skip it) and as part of the next (not so) random
> > number, without introducing bias. So we shouldn't, and we should in
> > fact move away from the old "smarter" behavior as a bug fix. Thank you!
>
> So I guess both the bug I reported and this one Nikita found are going
> to get fixed soon? For 7.2.0 maybe?
>
> Meanwhile, I released php_mt_seed 4.0 yesterday with support for latest
> PHP's mt_rand(), as well as with support for PHP 5.2.0 and below (as it
> happens, all the way to 3.0.7, although that's overkill). Near the end
> of its documentation, I included a lengthy section entitled "PHP version
> curiosities", which I include below in this e-mail in case any of this
> is useful for PHP's own documentation. It starts from PHP 3.0.6, but
> then actually spends half of the text on PHP 7.1.0+. Here we go:
>
> ---
> While php_mt_seed supports 3 major revisions of PHP's mt_rand()
> algorithm and that sort of covers PHP 3.0.7 through 7.1.0+ (up to the
> latest as of this writing and probably beyond), the reality is somewhat
> trickier than that. From older versions to newer:
>
> As a mere historical curiosity, php_mt_seed is in fact able to crack
> seeds of PHP 3.0.6, which is the very first version that introduced
> mt_rand(), but only as long as no range was passed into mt_rand(). That
> version had broken support for ranges, and indeed there's no point in
> supporting that short-lived breakage in php_mt_seed now. With this
> detail, php_mt_seed has some support for all mt_rand() capable versions
> of PHP released so far.
>
> Then there's PHP 3.0.7 through 5.2.0, where Mersenne Twister's state
> initialization is with multiples of 69069. This enables our stateless
> implementation to quickly jump to the state array element needed to
> compute the first mt_rand() output by using a precomputed value for
> 69069 raised to the power 396 (mod 2**32), which is MT's M-1. Another
> curiosity of those versions, which we take advantage of too, is that
> they treat adjacent even and odd seeds the same, so the effective seed
> space is 31-bit.
>
> PHP 3.0.6 to 4.1.2 used a default seed of 4357 (and thus also 4356) if
> mt_srand() was not called. PHP 4.2.0 changed that to automatic seeding
> using system time and PHP process ID (still predictable and now also
> leaky, but no longer a constant), but there was "Bug #25007 rand &
> mt_rand seed RNG every call" until 4.3.3, which presumably affected how
> cracked seeds could (not) be used.
>
> PHP 5.2.1 changed MT state initialization to MT authors' new recommended
> algorithm, which is no longer linear so we have to compute the first 397
> state elements (out of 624) even though in the simplest case we only
> need (and only store) the first and last one of those (or we could use a
> time-memory trade-off, which we currently don't).
>
> PHP 5.2.1 also introduced a bug into its implementation of MT (use of a
> wrong variable, whereas pre-5.2.1 code was correct in that respect).
> This bug lets us skip a few operations for every other seed, which we
> do, although this optimization is so minor that we could as well not
> bother. PHP 7.1.0 fixed this bug (reverting to pre-5.2.1 code in that
> respect, so we use the same logic for pre-5.2.1 and 7.1.0+ there).
>
> In PHP versions from 3.0.7 to 7.0.x, if mt_rand() was called with its
> optional output range specified, a 31-bit (0 to 2147483647
> <%28214%29%20748-3647>) MT PRNG
> output was scaled to that range using floating-point math. This meant
> that if a range wider than 31-bit was requested on a 64-bit build of
> PHP, some values would never occur. This also meant that even for most
> ranges smaller than 31-bit a bias was introduced (some output values
> became more likely than others), as compared to MT's raw output (which
> was relatively unbiased).
>
> PHP 7.1.0 tried to fix those biases by dropping the floating-point math
> and instead mapping the raw 32-bit MT PRNG outputs to the target range
> using integer modulo division. To avoid inherent bias when the target
> range isn't a whole power of 2 of possible integer values, a loop was
> introduced to skip raw 32-bit PRNG outputs (until a suitable one is
> seen) that would result in such bias. A bug in that code was found and
> reported due to work on php_mt_seed. As it turned out, the loop only
> works right in 32-bit builds of PHP, and is ineffective on 64-bit
> (except with 64-bit ranges, see below). Luckily, this actually makes
> things simpler for php_mt_seed, and currently php_mt_seed fully supports
> the behavior of 64-bit builds only (for ranges up to 0 to 2147483646
> <%28214%29%20748-3646>).
>
> There's currently no intent to add to php_mt_seed the complication of
> bias-avoidance of 32-bit builds of PHP 7.1.0+, as well as of 64-bit
> builds of future versions where the bug will presumably get fixed. What
> this means in practice is that for 32-bit builds of PHP and future
> versions of PHP, php_mt_seed may occasionally find wrong and miss
> correct seeds for mt_rand() invoked with a range, but the probability of
> this happening is very low except for very wide ranges that are not a
> whole power of 2 of possible integer values. For example, mt_rand(0,
> 61) or mt_rand(111, 222) are very unlikely to trigger the problem,
> mt_rand(0, 255) can't trigger the problem, whereas mt_rand(1000000000,
> 2000000000) is somewhat likely to trigger it. Such likely problematic
> ranges are probably rarely used and are of little relevance to uses of
> php_mt_seed. Also, supporting this buggy vs. correct behavior would
> require treating 32- and 64-bit builds of PHP separately and reporting
> on them differently.
>
> PHP 7.1.0 also tried to introduce proper support for 64-bit ranges in
> 64-bit builds. It generates two raw 32-bit PRNG outputs to derive one
> mt_rand() output when the target range spans more than a 32-bit space.
> Unfortunately, the implementation is buggy in a way where it'd introduce
> biases into such mt_rand() outputs. The bug will presumably get fixed
> as well, but regardless there's currently no intent to support wider
> than 31-bit ranges in php_mt_seed. This is obscure functionality
> (arguably, originally an accidental misfeature, which the PHP developers
> didn't really have to make official) that is only available on 64-bit
> builds of PHP. Currently, php_mt_seed does not allow specifying larger
> than 31-bit integers on its command line (it will report an error when a
> larger value is specified).
>
> Prior to PHP 7.1.0, mt_rand(0, 2147483647 <%28214%29%20748-3647>) was
> equivalent to mt_rand()
> without a range, and php_mt_seed still assumes so. This assumption is
> no longer valid for PHP 7.1.0+, which means that when searching for
> seeds for PHP 7.1.0+ for mt_rand() called with a range specified, you
> can specify at most a range one smaller than that, thus "0 2147483646
> <%28214%29%20748-3646>"
> being the maximum that php_mt_seed supports for those versions. This
> minor limitation shouldn't matter in practice, except that you might
> need to be aware you can continue to specify a range of "0 2147483647
> <%28214%29%20748-3647>"
> to indicate that no range was passed into mt_rand().
>
> PHP 7.1.0 also aliased rand() to mt_rand() and srand() to mt_srand().
> This means that on one hand you can use php_mt_seed to crack rand()
> seeds for PHP 7.1.0+ (since those are also mt_rand() seeds), but on the
> other hand this cross-seeding and cross-consumption of random numbers
> can affect which attacks work or don't work, and exactly how, against
> specific applications that make use of both sets of PHP functions.
>
> PHP 7.1.0 also introduced MT_RAND_PHP as optional second parameter to
> mt_srand(). When specified, it correctly enables behavior identical to
> that of PHP versions 5.2.1 to 7.0.x. Thus, seeds that php_mt_seed
> reports as valid for 5.2.1 to 7.0.x are always also valid for 7.1.0+
> with MT_RAND_PHP, and conversely seeds that php_mt_seed reports as valid
> for 7.1.0+ are often invalid for 7.1.0+ with MT_RAND_PHP (except when
> the same seeds are also valid for 5.2.1 to 7.0.x, which is common).
> ---
>
> Alexander
>

Sorry for the long delay. I've just applied
https://github.com/php/php-src/commit/fd07302024bc47082b13b32217147fd39d1e9e61
to the 7.2 branch.

Davey, Joe, do we want to take action here for 7.1? It's a pretty severe
bias, but fixing it is going to change seed sequences. I think at this
point we're too far in the 7.1 cycle to apply this kind of change.

Nikita
Solar Designer
Re: [PHP-DEV] PHP 7.1.0 to 7.2.0beta2 mt_rand() modulo bias bug
September 07, 2017 09:50PM
On Thu, Sep 07, 2017 at 08:23:22PM +0200, Nikita Popov wrote:
> Sorry for the long delay. I've just applied
> https://github.com/php/php-src/commit/fd07302024bc47082b13b32217147fd39d1e9e61
> to the 7.2 branch.

Thank you!

Maybe you'd add similar tests for 64-bit ranges? Right now,
rand_range64()'s bias avoidance is left untested. Need to come up with
numbers that would demonstrate the bias if the bias-avoiding loop
failed.

Also, the comment (by me, in the test) that says "7.1.0 to 7.2.0beta2"
should now say "7.1.0 to 7.2.0beta3" since beta3 was released with the
bug still intact.

Alexander

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
From: Nikita Popov <[email protected]>
>
> Sorry for the long delay. I've just applied
> https://github.com/php/php-src/commit/fd07302024bc47082b13b32217147fd39d1e9e61
> to the 7.2 branch.
>
> Davey, Joe, do we want to take action here for 7.1? It's a pretty
> severe
> bias, but fixing it is going to change seed sequences. I think at this
> point we're too far in the 7.1 cycle to apply this kind of change.

I think it is very unlikely that anyone has PHP software that relies on
predictable output given a 64-bit seed. And, yes, the bias is bad so I
would not worry about fixing it asap.

Tom

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Solar Designer
Re: [PHP-DEV] PHP 7.1.0 to 7.2.0beta2 mt_rand() modulo bias bug
September 08, 2017 02:40PM
On Fri, Sep 08, 2017 at 07:56:23AM -0400, Tom Worster wrote:
> From: Nikita Popov <[email protected]>
> >
> >Sorry for the long delay. I've just applied
> >https://github.com/php/php-src/commit/fd07302024bc47082b13b32217147fd39d1e9e61
> >to the 7.2 branch.
> >
> >Davey, Joe, do we want to take action here for 7.1? It's a pretty
> >severe
> >bias, but fixing it is going to change seed sequences. I think at this
> >point we're too far in the 7.1 cycle to apply this kind of change.
>
> I think it is very unlikely that anyone has PHP software that relies on
> predictable output given a 64-bit seed. And, yes, the bias is bad so I
> would not worry about fixing it asap.

This sounds confused. There's no 64-bit seed - PHP's mt_srand() only
supports 32-bit seeds. Then you say "the bias is bad" and at the same
time "would not worry about fixing it asap", which look inconsistent.

The original problem I reported applies to 64-bit builds of PHP - which
is probably most builds these days - when mt_rand() is invoked with a
range that fits in 32 bits - which again is the typical case for the use
of ranges. However, the bias can be large only for large ranges (yet
not exceeding 32 bits). For typically used small ranges, the bias is
small. Also, fixing the bug doesn't fully change the sequence of
generated random numbers - for typically used small ranges, the
probability that the fix changes a random number to another (for the
same seed) is small. So the sequences will change, but not fully. I'm
not sure if this is good or bad, as sometimes complete failure of
something that worked for someone before is preferable; I merely point
out what will actually happen.

Later in the discussion, Nikita pointed out an extra problem (also
causing biases) that affected the rarely-used 64-bit ranges. Similarly,
fixing it doesn't fully change the sequence of generated random numbers -
again, for typically used small ranges (this time relative to the
64-bit space), the probability that the fix changes a random number to
another (for the same seed) is small.

Another detail is that these fixes make 32- and 64-bit builds of PHP
consistent, which isn't the case for 7.1.x now. So retaining the bugs
in 7.1.x for consistent behavior doesn't exactly achieve that - it does
for consistency within 7.1.x series, but not across 32- vs. 64-bit
builds. Fixing the bugs would achieve the latter, but break the former.

I have no strong preference here. I merely point out the confusion and
try to correct it.

Alexander

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Feels too late for 7.1

Cheers
Joe

On Fri, Sep 8, 2017 at 1:31 PM, Solar Designer <[email protected]> wrote:

> On Fri, Sep 08, 2017 at 07:56:23AM -0400, Tom Worster wrote:
> > From: Nikita Popov <[email protected]>
> > >
> > >Sorry for the long delay. I've just applied
> > >https://github.com/php/php-src/commit/fd07302024bc47082b13b32217147f
> d39d1e9e61
> > >to the 7.2 branch.
> > >
> > >Davey, Joe, do we want to take action here for 7.1? It's a pretty
> > >severe
> > >bias, but fixing it is going to change seed sequences. I think at this
> > >point we're too far in the 7.1 cycle to apply this kind of change.
> >
> > I think it is very unlikely that anyone has PHP software that relies on
> > predictable output given a 64-bit seed. And, yes, the bias is bad so I
> > would not worry about fixing it asap.
>
> This sounds confused. There's no 64-bit seed - PHP's mt_srand() only
> supports 32-bit seeds. Then you say "the bias is bad" and at the same
> time "would not worry about fixing it asap", which look inconsistent.
>
> The original problem I reported applies to 64-bit builds of PHP - which
> is probably most builds these days - when mt_rand() is invoked with a
> range that fits in 32 bits - which again is the typical case for the use
> of ranges. However, the bias can be large only for large ranges (yet
> not exceeding 32 bits). For typically used small ranges, the bias is
> small. Also, fixing the bug doesn't fully change the sequence of
> generated random numbers - for typically used small ranges, the
> probability that the fix changes a random number to another (for the
> same seed) is small. So the sequences will change, but not fully. I'm
> not sure if this is good or bad, as sometimes complete failure of
> something that worked for someone before is preferable; I merely point
> out what will actually happen.
>
> Later in the discussion, Nikita pointed out an extra problem (also
> causing biases) that affected the rarely-used 64-bit ranges. Similarly,
> fixing it doesn't fully change the sequence of generated random numbers -
> again, for typically used small ranges (this time relative to the
> 64-bit space), the probability that the fix changes a random number to
> another (for the same seed) is small.
>
> Another detail is that these fixes make 32- and 64-bit builds of PHP
> consistent, which isn't the case for 7.1.x now. So retaining the bugs
> in 7.1.x for consistent behavior doesn't exactly achieve that - it does
> for consistency within 7.1.x series, but not across 32- vs. 64-bit
> builds. Fixing the bugs would achieve the latter, but break the former.
>
> I have no strong preference here. I merely point out the confusion and
> try to correct it.
>
> Alexander
>
On 8 Sep 2017, at 8:31, Solar Designer wrote:

> On Fri, Sep 08, 2017 at 07:56:23AM -0400, Tom Worster wrote:
>> From: Nikita Popov <[email protected]>
>>>
>>> Sorry for the long delay. I've just applied
>>> https://github.com/php/php-src/commit/fd07302024bc47082b13b32217147fd39d1e9e61
>>> to the 7.2 branch.
>>>
>>> Davey, Joe, do we want to take action here for 7.1? It's a pretty
>>> severe
>>> bias, but fixing it is going to change seed sequences. I think at
>>> this
>>> point we're too far in the 7.1 cycle to apply this kind of change.
>>
>> I think it is very unlikely that anyone has PHP software that relies
>> on
>> predictable output given a 64-bit seed. And, yes, the bias is bad so
>> I
>> would not worry about fixing it asap.
>
> This sounds confused. There's no 64-bit seed - PHP's mt_srand() only
> supports 32-bit seeds. Then you say "the bias is bad" and at the same
> time "would not worry about fixing it asap", which look inconsistent.
>
> The original problem I reported applies to 64-bit builds of PHP -
> which
> is probably most builds these days - when mt_rand() is invoked with a
> range that fits in 32 bits - which again is the typical case for the
> use
> of ranges. However, the bias can be large only for large ranges (yet
> not exceeding 32 bits). For typically used small ranges, the bias is
> small. Also, fixing the bug doesn't fully change the sequence of
> generated random numbers - for typically used small ranges, the
> probability that the fix changes a random number to another (for the
> same seed) is small. So the sequences will change, but not fully.
> I'm
> not sure if this is good or bad, as sometimes complete failure of
> something that worked for someone before is preferable; I merely point
> out what will actually happen.
>
> Later in the discussion, Nikita pointed out an extra problem (also
> causing biases) that affected the rarely-used 64-bit ranges.
> Similarly,
> fixing it doesn't fully change the sequence of generated random
> numbers -
> again, for typically used small ranges (this time relative to the
> 64-bit space), the probability that the fix changes a random number to
> another (for the same seed) is small.
>
> Another detail is that these fixes make 32- and 64-bit builds of PHP
> consistent, which isn't the case for 7.1.x now. So retaining the bugs
> in 7.1.x for consistent behavior doesn't exactly achieve that - it
> does
> for consistency within 7.1.x series, but not across 32- vs. 64-bit
> builds. Fixing the bugs would achieve the latter, but break the
> former.
>
> I have no strong preference here. I merely point out the confusion
> and
> try to correct it.

Yes, I was confused. I meant to talk about large ranges but even so your
summary is an education so thank you.

My input is to offer an opinion on the relative importance of
considerations.

Fixing the bias would be an urgent priority because I think I a lot of
programs are written assuming a uniform distribution.

While I broadly agree with what you describe as "typical", it might be
hard for a user to know how big a problem the bias is in their
situation. Fixing the bias eliminates that doubt and the handwaving
about what is typical.

Programs that exploit the predictable property are specialized
(comparing different monte carlo experiments based on the same
pseudorandom input is the only example I know) and I think much less
common in PHP. (Note: these programs are also likely to need unbiased
stats.) I doubt that something will fail (i.e. break as in BC break) due
to inconsistency within 7.1.x but the change might cause some extra work
or faulty experimental conclusions. If I were an experimenter dealing
with this change I'd rather rerun the cases I ran on the buggy RNG than
continue with a known bad RNG. And I'd rather do this sooner than later.

I think we serve this specialized community better (if it exists at
all!?) fixing it in 7.1, which also helps make these users aware of the
bug. Everyone else is probably either unaffected by the fix or their
programs will behave better.

Tom

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Sorry, only registered users may post in this forum.

Click here to login