Welcome! Log In Create A New Profile

Advanced

[PHP-DEV] Fixing bug #18556 (was: Complete case-sensitivity in PHP)

Posted by Galen Wright-Watson 
Galen Wright-Watson
[PHP-DEV] Fixing bug #18556 (was: Complete case-sensitivity in PHP)
April 24, 2012 01:10AM
On Mon, Apr 23, 2012 at 3:22 AM, C.Koy <[email protected]> wrote:

> On 4/22/2012 11:32 PM, Galen Wright-Watson wrote:
>
>> 2012/4/22 C.Koy<[email protected]>
>>
>> On 4/21/2012 4:37 AM, Galen Wright-Watson wrote:
>>>
>>
>> But, I did not start this thread to discuss such bug fix, because:
>>>
>>> 1. It does not take a genius to figure it out, and should take minutes to
>>> implement for someone experienced in the internals. Given the 10 year
>>> span
>>> and dozens of comments/complaints on the bug's entry, it's hard to say
>>> this
>>> issue went unnoticed. So I had to conclude that such fix has quietly been
>>> overruled for performance and/or other undisclosed reasons.
>>>
>>>
>> Why does it matter if a solution is simple?
>>
>
> It doesn't matter, you've misunderstood.
>

You've misunderstood me. While you may have set out with the goal of
discussing making PHP completely case-sensitive, that doesn't preclude
others from suggesting fixes for the specific bug you mention. Indeed, some
of the first e-mails were around the bug, and not just in the context of
case-sensitive PHP.

I didn't introduce the custom case conversion solution as a
counter-argument to case-sensitive PHP, and I wasn't asking for feedback on
that solution in the context of case-sensitive PHP; I was asking for
reasons why it wouldn't be a suitable solution for the bug. The only place
case-sensitive PHP enters into it was your statement that:

As the recent comments on that page indicate, there's not a deterministic
> way to resolve this issue, apart from eliminating tolower() calls for
> function/class names during lookup. Hence totally case-sensitive PHP.


My proposition shows this is isn't entirely true, and branches off from the
original discussion at that point. I'm focusing on fixing the bug, which is
a smaller issue than case-sensitivity. Discussion of case-sensitivity can
continue without regard to the custom conversion solution. As such, I've
changed the subject of this e-mail.

Furthermore, going back to your original e-mail, you explicitly stated it
was about the bug, making case sensitivity subordinate to it.

This post is about bug #18556
(https://bugs.php.net/bug.php?**id=18556https://bugs.php.net/bug.php?id=18556)
> which is a decade old.


I hope you can see why others might take the bug to be the context for
case-sensitivity, rather than the other way around.

And that's what makes me curious and confused about why this bug still
> exists. See, I'm drawing a conclusion with what little information I have,
> and stating the reasonings it's based on (first two statements).
> Overall, that and the item following it were an explanation of "why I'm
> suggesting a major feature change in solution to a specific bug", although
> noone directly asked me to.
>
> In other words, you jumped to a conclusion. I wasn't asking about possible
reasons why custom conversion hasn't been accepted as the solution to this
bug. Neither was I asking why you didn't suggest it. I was (and still am)
asking for explicit, justifiable reasons as to whether or not it's a
suitable solution to the bug.


>
>> If it's already been rejected privately, it's time to bring the reasons
>> into the open (which is why I asked). If not, it should be considered
>> publicly.
>>
>
> A comment dated 2002-09-26 on bug's page states the bug is fixed. The next
> comment dated 2006-02-17 states it reappeared.
> I don't know who did what 10, 6 years ago but it's been revoked. Why?
> That was the main reason I deemed this bug not fixable, hence suggest
> other ways to resolve.
>
> I don't know either, but I'm not about to disregard potential fixes if
they haven't been publicly discussed. The regression could just as easily
have been a mistake. From looking at the original fix (revision 97040,
http://svn.php.net/viewvc?view=revision&revision=97040, authored by iliaa)
and the bug comments, something along the lines of what I'm suggesting has
been suggested and even implemented before, but there's no real discussion
of it. The original fix (zend_str_tolower_nlc) assumed ASCII, which isn't
entirely suitable as there are uppercase characters that it doesn't
convert, which suggests yet another reason for the regression, namely that
using zend_str_tolower would convert the characters that
zend_str_tolower_nlc missed.

As for the real reason why the bug reappeared, we can continue on in our
historical examination. Revision 99001 (
http://svn.php.net/viewvc?view=revision&revision=99001, also authored
by iliaa) replaced zend_str_tolower with zend_str_tolower_nlc, making all
internal Zend case conversion use ASCII. iliaa had this to say about the
change (http://news.php.net/php.zend-engine.cvs/478):

It appears that there no reason to keep both zend_str_tolower_nlc and
> zend_str_tolower. zend_str_tolower_nlc can be safely renamed to
> zend_str_tolower. The places it is used in, do not appear to depend on
> locale. For people who do need it there is an alternative php function
> php_strtolower, which they can use, which does respect the locale. So, if
> there are no objections I'll prepare a patch that will change
> zend_str_tolower_nlc to zend_str_tolower.


Revision 128057 (http://svn.php.net/viewvc?view=revision&revision=128057,
authored by sterling) adds zend_str_tolower for use in
fast_call_user_function, which makes use of tolower rather than a custom
conversion. Revision 128060 (
http://svn.php.net/viewvc?view=revision&revision=128060, same author) then
changes zend_str_tolower to use tolower instead of its custom ASCII-based
conversion. The commit message is: "make this faster and sexier". Within
these revisions, zend_lookup_class is case sensitive. This change, in
combination with 99001, mask the reason for the custom conversion.

Introduction of zend_tolower and use of tolower_l was introduced by
revision 224372 (http://svn.php.net/viewvc?view=revision&revision=224372,
authored by stas (hi, Stas!)). The commit message is: "Improve
tolower()-related functions on Windows and VC2005 by caching locale and
using tolower_l function."

There are plenty of other edits to Zend functions affecting case handling
(look over the commit messages listed in
http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_operators.c?view=log&pathrev=225000)
that make similar tweaks involving case conversion and the character
encoding. What are we to conclude from all this? That the custom conversion
was a bug fix was lost as the file was edited and different people worked
on it. In other words, the fix was not lost due to a conscious decision
made by anyone, but rather the typical reason for regression (in the
original sense of the word): there's too much for anyone to keep all of it
in mind at once, so someone can easily re-introduce a bug without being
aware of it.

I trust this demonstrates that "there must be an undisclosed reason" isn't
a justifiable reason not to implement my proposed solution.

The abstract property that makes a locale problematic is obvious. I
>> was looking for specific locales, as they need to be identified for a
>> complete solution.
>
>
> I'm not locale expert. Given the public complaints/bugs we can, in
> practice, assume this affects Turkish and Azerbaijani only. (I don't know
> about Kurdish)
>
> Kurdish is mentioned by Mike and Tokul in the comments for the bug. I
could easily have come to the same conclusion, but I want an answer from
someone who knows without needing to make any assumptions. Are there any
locale experts (or someone willing to put in the leg-work) reading this
with a conclusive answer to my question about problematic locales?
On Tue, Apr 24, 2012 at 1:06 AM, Galen Wright-Watson <[email protected]>wrote:

> On Mon, Apr 23, 2012 at 3:22 AM, C.Koy <[email protected]> wrote:
>
> > On 4/22/2012 11:32 PM, Galen Wright-Watson wrote:
> >
> >> 2012/4/22 C.Koy<[email protected]>
> >>
> >> On 4/21/2012 4:37 AM, Galen Wright-Watson wrote:
> >>>
> >>
> >> But, I did not start this thread to discuss such bug fix, because:
> >>>
> >>> 1. It does not take a genius to figure it out, and should take minutes
> to
> >>> implement for someone experienced in the internals. Given the 10 year
> >>> span
> >>> and dozens of comments/complaints on the bug's entry, it's hard to say
> >>> this
> >>> issue went unnoticed. So I had to conclude that such fix has quietly
> been
> >>> overruled for performance and/or other undisclosed reasons.
> >>>
> >>>
> >> Why does it matter if a solution is simple?
> >>
> >
> > It doesn't matter, you've misunderstood.
> >
>
> You've misunderstood me. While you may have set out with the goal of
> discussing making PHP completely case-sensitive, that doesn't preclude
> others from suggesting fixes for the specific bug you mention. Indeed, some
> of the first e-mails were around the bug, and not just in the context of
> case-sensitive PHP.
>
> I didn't introduce the custom case conversion solution as a
> counter-argument to case-sensitive PHP, and I wasn't asking for feedback on
> that solution in the context of case-sensitive PHP; I was asking for
> reasons why it wouldn't be a suitable solution for the bug. The only place
> case-sensitive PHP enters into it was your statement that:
>
> As the recent comments on that page indicate, there's not a deterministic
> > way to resolve this issue, apart from eliminating tolower() calls for
> > function/class names during lookup. Hence totally case-sensitive PHP.
>
>
> My proposition shows this is isn't entirely true, and branches off from the
> original discussion at that point. I'm focusing on fixing the bug, which is
> a smaller issue than case-sensitivity. Discussion of case-sensitivity can
> continue without regard to the custom conversion solution. As such, I've
> changed the subject of this e-mail.
>
> Furthermore, going back to your original e-mail, you explicitly stated it
> was about the bug, making case sensitivity subordinate to it.
>
> This post is about bug #18556
> (https://bugs.php.net/bug.php?**id=18556<
> https://bugs.php.net/bug.php?id=18556>;)
> > which is a decade old.
>
>
> I hope you can see why others might take the bug to be the context for
> case-sensitivity, rather than the other way around.
>
> And that's what makes me curious and confused about why this bug still
> > exists. See, I'm drawing a conclusion with what little information I
> have,
> > and stating the reasonings it's based on (first two statements).
> > Overall, that and the item following it were an explanation of "why I'm
> > suggesting a major feature change in solution to a specific bug",
> although
> > noone directly asked me to.
> >
> > In other words, you jumped to a conclusion. I wasn't asking about
> possible
> reasons why custom conversion hasn't been accepted as the solution to this
> bug. Neither was I asking why you didn't suggest it. I was (and still am)
> asking for explicit, justifiable reasons as to whether or not it's a
> suitable solution to the bug.
>
>
> >
> >> If it's already been rejected privately, it's time to bring the reasons
> >> into the open (which is why I asked). If not, it should be considered
> >> publicly.
> >>
> >
> > A comment dated 2002-09-26 on bug's page states the bug is fixed. The
> next
> > comment dated 2006-02-17 states it reappeared.
> > I don't know who did what 10, 6 years ago but it's been revoked. Why?
> > That was the main reason I deemed this bug not fixable, hence suggest
> > other ways to resolve.
> >
> > I don't know either, but I'm not about to disregard potential fixes if
> they haven't been publicly discussed. The regression could just as easily
> have been a mistake. From looking at the original fix (revision 97040,
> http://svn.php.net/viewvc?view=revision&revision=97040, authored by iliaa)
> and the bug comments, something along the lines of what I'm suggesting has
> been suggested and even implemented before, but there's no real discussion
> of it. The original fix (zend_str_tolower_nlc) assumed ASCII, which isn't
> entirely suitable as there are uppercase characters that it doesn't
> convert, which suggests yet another reason for the regression, namely that
> using zend_str_tolower would convert the characters that
> zend_str_tolower_nlc missed.
>
> As for the real reason why the bug reappeared, we can continue on in our
> historical examination. Revision 99001 (
> http://svn.php.net/viewvc?view=revision&revision=99001, also authored
> by iliaa) replaced zend_str_tolower with zend_str_tolower_nlc, making all
> internal Zend case conversion use ASCII. iliaa had this to say about the
> change (http://news.php.net/php.zend-engine.cvs/478):
>
> It appears that there no reason to keep both zend_str_tolower_nlc and
> > zend_str_tolower. zend_str_tolower_nlc can be safely renamed to
> > zend_str_tolower. The places it is used in, do not appear to depend on
> > locale. For people who do need it there is an alternative php function
> > php_strtolower, which they can use, which does respect the locale. So, if
> > there are no objections I'll prepare a patch that will change
> > zend_str_tolower_nlc to zend_str_tolower.
>
>
> Revision 128057 (http://svn.php.net/viewvc?view=revision&revision=128057,
> authored by sterling) adds zend_str_tolower for use in
> fast_call_user_function, which makes use of tolower rather than a custom
> conversion. Revision 128060 (
> http://svn.php.net/viewvc?view=revision&revision=128060, same author) then
> changes zend_str_tolower to use tolower instead of its custom ASCII-based
> conversion. The commit message is: "make this faster and sexier". Within
> these revisions, zend_lookup_class is case sensitive. This change, in
> combination with 99001, mask the reason for the custom conversion.
>
> Introduction of zend_tolower and use of tolower_l was introduced by
> revision 224372 (http://svn.php.net/viewvc?view=revision&revision=224372,
> authored by stas (hi, Stas!)). The commit message is: "Improve
> tolower()-related functions on Windows and VC2005 by caching locale and
> using tolower_l function."
>
> There are plenty of other edits to Zend functions affecting case handling
> (look over the commit messages listed in
>
> http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_operators.c?view=log&pathrev=225000
> )
> that make similar tweaks involving case conversion and the character
> encoding. What are we to conclude from all this? That the custom conversion
> was a bug fix was lost as the file was edited and different people worked
> on it. In other words, the fix was not lost due to a conscious decision
> made by anyone, but rather the typical reason for regression (in the
> original sense of the word): there's too much for anyone to keep all of it
> in mind at once, so someone can easily re-introduce a bug without being
> aware of it.
>
> I trust this demonstrates that "there must be an undisclosed reason" isn't
> a justifiable reason not to implement my proposed solution.
>
> The abstract property that makes a locale problematic is obvious. I
> >> was looking for specific locales, as they need to be identified for a
> >> complete solution.
> >
> >
> > I'm not locale expert. Given the public complaints/bugs we can, in
> > practice, assume this affects Turkish and Azerbaijani only. (I don't know
> > about Kurdish)
> >
> > Kurdish is mentioned by Mike and Tokul in the comments for the bug. I
> could easily have come to the same conclusion, but I want an answer from
> someone who knows without needing to make any assumptions. Are there any
> locale experts (or someone willing to put in the leg-work) reading this
> with a conclusive answer to my question about problematic locales?
>

thanks for digging this out.

ps: you had a few extra > at the end of the first lines of your sentences,
I experienced similar problems with gmail, the solution for me was to
always put an extra new line after the quoted text.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
>
>
> ps: you had a few extra > at the end of the first lines of your sentences,
> I experienced similar problems with gmail, the solution for me was to
> always put an extra new line after the quoted text.
>
>
what I meant is the beginning of the first line, not the end.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
On 04/24/2012 01:06 AM, Galen Wright-Watson wrote:

> http://svn.php.net/viewvc?view=revision&revision=128060, same author) then
> changes zend_str_tolower to use tolower instead of its custom ASCII-based
> conversion. The commit message is: "make this faster and sexier". Within
> these revisions, zend_lookup_class is case sensitive. This change, in
> combination with 99001, mask the reason for the custom conversion.

Argh .... STERLING!!!111

ok, part of the story seems to be that i can't find the regression test
tests/lang/035.phpt that i mentioned in bug #18556 anywhere. In the 5.x
code base this is a test for some Expection related stuff, and in the
latest 4.x branch the highest test number in test/lang is 034.phpt

So it seems as if i somehow never really committed my test case and
so Sterling, not being aware of the "turkish" history, unfixed things
during micro optimization withozut anything in place to warn him about
the regression he introduced :(

(AFAIR it was me back then who first stumbled about "i"!=tolower("I")
in tr_TR after noticing that most of our "Image functions don't work
even though the gd extension is active" came from Turkey ...)

--
hartmut

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Hi,
As of 5.3.0 this bug does not exist for function names. Only classes and
interfaces.

Could this be a clue for how to fix it for those as well?





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On Thu, Apr 26, 2012 at 3:45 AM, C.Koy <[email protected]> wrote:

> As of 5.3.0 this bug does not exist for function names. Only classes and
> interfaces.
>
>
Turns out, if you cause a function to be called dynamically by (e.g.) using
a variable function, the bug will surface.

<?php
setlocale(LC_CTYPE, 'tr_TR');
function IJK() {}
# succeeds
IJK();
$f = 'IJK';
# causes Fatal error: Call to undefined function IJK()
$f();

In contrast, if you set the locale for LC_CTYPE on the command line, the
bug doesn't arise at all because the compilation and execution phases both
use the same locale.



> Could this be a clue for how to fix it for those as well?


Function names are generally resolved at compile time (dynamic function
names are resolved at run time, which is why the bug surfaces for them),
before the call to setlocale in the script has been executed. Class name
resolution is put off until execution time for autoloading and possibly
other purposes. Converting class names to lowercase at compile time may
work. A quick glance at the source shows that class_name,
fully_qualified_class_name and class_name_reference all depend on
namespace_name, which is the rule that is responsible for the parsing of
the class name.

namespace_name:
T_STRING { $$ = $1; }
| namespace_name T_NS_SEPARATOR T_STRING {
zend_do_build_namespace_name(&$$, &$1, &$3 TSRMLS_CC); }
;

However, static_scalar is also dependent on namespace_name, and I don't
believe that symbol should be made case-insensitive. Creating an additional
symbol for case-independency would allow a more targeted approach. The
various class symbols would then rely on this new symbol, rather than
namespace_name.

lc_namespace_name:
T_STRING { zend_str_tolower($1); $$ = $1; }
| lc_namespace_name T_NS_SEPARATOR T_STRING { zend_str_tolower($3);
zend_do_build_namespace_name(&$$, &$1, &$3 TSRMLS_CC); }
;

Converting class names to lower case early may have additional
consequences. It may affect class names in error messages, for example (I
didn't dig deep enough to determine this). __CLASS__ should be unaffected
(when defining a class, the class name is parsed as a T_STRING; the value
for __CLASS__ comes from this symbol). It also won't resolve the bug for
dynamic names. I suspect that altering variable_class_name and
dynamic_class_name_reference in a manner described previously (use a custom
lowercase conversion or temporarily switch locale) to convert the name
would resolve the bug in the dynamic case for class names. Changing a
number of the production rules for function_call in a similar manner should
resolve the bug for dynamic function call. Again, there will likely be
unintended consequences. Alternatively, updating
zend_do_begin_dynamic_function_call() and zend_do_fetch_class() to use
custom conversion should resolve the bug in the dynamic case.

I like the idea of using the system default locale for name conversion
(making name resolution independent of the current locale), but am
concerned that it will make name lookup slow. Instead, a second set of
locale-independent, unicode-aware conversion functions (basically, iliaa's
original solution, but Unicode compatible) to be used for identifiers would
make name resolution independent of the current locale. Any time an
identifiers needs to be converted, it would use one of these functions. As
a run-time optimization, non-dynamic class names could use the system
locale conversion, but that would be a separate thing from resolving this
bug.
On Tue, May 1, 2012 at 11:11 AM, Galen Wright-Watson <[email protected]>wrote:

>
> [...] Instead, a second set of locale-independent, unicode-aware
> conversion functions (basically, iliaa's original solution, but Unicode
> compatible) to be used for identifiers would make name resolution
> independent of the current locale. [...]
>

I believe all these functions would need to do is use tolower, rather than
tolower_l. So, perhaps the new functions should get the old names, and the
old functions should get "_l" appended to their names.
On 5/1/2012 9:11 PM, Galen Wright-Watson wrote:
> On Thu, Apr 26, 2012 at 3:45 AM, C.Koy<[email protected]> wrote:
>
>> As of 5.3.0 this bug does not exist for function names. Only classes and
>> interfaces.
>>
>>
> Turns out, if you cause a function to be called dynamically by (e.g.) using
> a variable function, the bug will surface.
>
> <?php
> setlocale(LC_CTYPE, 'tr_TR');
> function IJK() {}
> # succeeds
> IJK();

If literal function call precedes the function definition, that would
fail too in 5.2.17, but not in 5.3.0.
What has changed in this regard 5.2->5.3 ?


> $f = 'IJK';
> # causes Fatal error: Call to undefined function IJK()
> $f();
>
> In contrast, if you set the locale for LC_CTYPE on the command line, the
> bug doesn't arise at all because the compilation and execution phases both
> use the same locale.
>

So, the bug also arises if a script started in 'tr_TR' env locale sets
its locale to 'en_US' at runtime.

[...]

>
> I like the idea of using the system default locale for name conversion
> (making name resolution independent of the current locale), but am

As I stated above, the locale the script was started in may not always
be 'en_US' or 'C'. (assuming that's what you mean by "system default
locale")

By the way, I noticed a setlocale(LC_CTYPE, "") call in
php_module_startup()/main.c, but can't figure if it has any relevance to
this bug.

regards,





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On Wed, May 2, 2012 at 5:23 AM, C.Koy <[email protected]> wrote:

> On 5/1/2012 9:11 PM, Galen Wright-Watson wrote:
>
>> On Thu, Apr 26, 2012 at 3:45 AM, C.Koy<[email protected]> wrote:
>>
>> As of 5.3.0 this bug does not exist for function names. Only classes and
>>> interfaces.
>>>
>>>
>>> Turns out, if you cause a function to be called dynamically by (e.g.)
>> using
>> a variable function, the bug will surface.
>>
>> <?php
>> setlocale(LC_CTYPE, 'tr_TR');
>> function IJK() {}
>> # succeeds
>> IJK();
>>
>
> If literal function call precedes the function definition, that would fail
> too in 5.2.17, but not in 5.3.0.
> What has changed in this regard 5.2->5.3 ?
>
>
Do you mean something like the following?

<?php
setlocale(LC_CTYPE, 'tr_TR');
IJK();
setlocale(LC_CTYPE, 'en_US');
function IJK() {echo __FUNCTION__, "\n";}

I couldn't get it to generate an error under PHP 5.2.17. What am I missing?


>
>> In contrast, if you set the locale for LC_CTYPE on the command line, the
>> bug doesn't arise at all because the compilation and execution phases both
>> use the same locale.
>>
>>
> So, the bug also arises if a script started in 'tr_TR' env locale sets its
> locale to 'en_US' at runtime.
>
>
Yup.

$ LC_CTYPE=tr_TR php
<?php
setlocale(LC_CTYPE, 'en_US');
class I {}
$i = new I;
^D
Fatal error: Class 'I' not found in - on line 4

Call Stack:
0.3740 630760 1. {main}() -:0

I should say that the Vulcan Logic Disassembler has been very helpful to me
in exploring this bug. Thank you, Derick Rethans and the rest of the VLD
team. If you haven't tried it, check it out.


> [...]
>
>
>
>> I like the idea of using the system default locale for name conversion
>> (making name resolution independent of the current locale), but am
>>
>
> As I stated above, the locale the script was started in may not always be
> 'en_US' or 'C'. (assuming that's what you mean by "system default locale")
>
>
That's indeed what I meant; basically, the locales specified in the
LC_CTYPE &c. environment variables.

It shouldn't matter that the default locale isn't "en_US" or "C", as long
as PHP always uses the same locale for identifiers both during compilation
and at run-time. Of course, it also makes a certain amount sense to
explicitly decide that PHP will use a specific locale for identifiers. I
avoided suggesting that route to avoid any issues about what locales will
be universally available.


> By the way, I noticed a setlocale(LC_CTYPE, "") call in
> php_module_startup()/main.c, but can't figure if it has any relevance to
> this bug.
>
>
That would set the locale to whatever the platform uses natively. Without
the call, the locale would be "POSIX"/"C", according to the POSIX doc (
http://pubs.opengroup.org/onlinepubs/009604499/functions/setlocale.html).
It doesn't seem terribly relevant to bug 18556, since all that matters
regarding the initial locale is that its lowercase conversion is different
from the locale that's used at run-time. If I had to guess why the locale
is set to the platform native, it's so that numeric, currency and date
formatting will be consistent with the rest of the system.
On 5/2/2012 10:03 PM, Galen Wright-Watson wrote:
> On Wed, May 2, 2012 at 5:23 AM, C.Koy<[email protected]> wrote:
>
>> On 5/1/2012 9:11 PM, Galen Wright-Watson wrote:
>>
>>> On Thu, Apr 26, 2012 at 3:45 AM, C.Koy<[email protected]> wrote:
>>>
>>> As of 5.3.0 this bug does not exist for function names. Only classes and
>>>> interfaces.
>>>>
>>>>
>>>> Turns out, if you cause a function to be called dynamically by (e.g.)
>>> using
>>> a variable function, the bug will surface.
>>>
>>> <?php
>>> setlocale(LC_CTYPE, 'tr_TR');
>>> function IJK() {}
>>> # succeeds
>>> IJK();
>>>
>>
>> If literal function call precedes the function definition, that would fail
>> too in 5.2.17, but not in 5.3.0.
>> What has changed in this regard 5.2->5.3 ?
>>
>>
> Do you mean something like the following?
>
> <?php
> setlocale(LC_CTYPE, 'tr_TR');
> IJK();
> setlocale(LC_CTYPE, 'en_US');
> function IJK() {echo __FUNCTION__, "\n";}
>
> I couldn't get it to generate an error under PHP 5.2.17. What am I missing?
>

Try this with 5.2.17:

<?php
setlocale(LC_CTYPE, 'tr_TR');
IJK();
function IJK() {}




--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On Fri, May 4, 2012 at 7:01 AM, C.Koy <[email protected]> wrote:

> On 5/2/2012 10:03 PM, Galen Wright-Watson wrote:
>
>> On Wed, May 2, 2012 at 5:23 AM, C.Koy<[email protected]> wrote:
>>
>> On 5/1/2012 9:11 PM, Galen Wright-Watson wrote:
>>>
>>> On Thu, Apr 26, 2012 at 3:45 AM, C.Koy<[email protected]> wrote:
>>>>
>>>> As of 5.3.0 this bug does not exist for function names. Only classes
>>>> and
>>>>
>>>>> interfaces.
>>>>>
>>>>>
>>>>> Turns out, if you cause a function to be called dynamically by (e.g.)
>>>>>
>>>> using
>>>> a variable function, the bug will surface.
>>>>
>>>> <?php
>>>> setlocale(LC_CTYPE, 'tr_TR');
>>>> function IJK() {}
>>>> # succeeds
>>>> IJK();
>>>>
>>>>
>>> If literal function call precedes the function definition, that would
>>> fail
>>> too in 5.2.17, but not in 5.3.0.
>>> What has changed in this regard 5.2->5.3 ?
>>>
>>>
>>> Do you mean something like the following?
>>
>> <?php
>> setlocale(LC_CTYPE, 'tr_TR');
>> IJK();
>> setlocale(LC_CTYPE, 'en_US');
>> function IJK() {echo __FUNCTION__, "\n";}
>>
>> I couldn't get it to generate an error under PHP 5.2.17. What am I
>> missing?
>>
>>
> Try this with 5.2.17:
>
>
> <?php
> setlocale(LC_CTYPE, 'tr_TR');
> IJK();
> function IJK() {}
>
>
That also ran without error for me. I'm not sure how to account for the
different behavior. Here are the details of the system that I'm using:

$ uname -a
> Linux n10 3.2.6mtv10 #1 SMP Wed Mar 14 06:22:06 PDT 2012 x86_64 GNU/Linux
> $ php -v
> PHP 5.2.17 with Suhosin-Patch 0.9.7 (cli) (built: May 3 2012 12:16:32)
> Copyright (c) 1997-2009 The PHP Group
> Zend Engine v2.2.0, Copyright (c) 1998-2010 Zend Technologies
> with Zend Optimizer v3.3.9, Copyright (c) 1998-2009, by Zend
> Technologies
> with Suhosin v0.9.32.1, Copyright (c) 2007-2010, by SektionEins GmbH
On 5/5/2012 12:22 AM, Galen Wright-Watson wrote:
> That also ran without error for me. I'm not sure how to account for the
> different behavior. Here are the details of the system that I'm using:
>
> $ uname -a
>> Linux n10 3.2.6mtv10 #1 SMP Wed Mar 14 06:22:06 PDT 2012 x86_64 GNU/Linux
>> $ php -v
>> PHP 5.2.17 with Suhosin-Patch 0.9.7 (cli) (built: May 3 2012 12:16:32)
>> Copyright (c) 1997-2009 The PHP Group
>> Zend Engine v2.2.0, Copyright (c) 1998-2010 Zend Technologies
>> with Zend Optimizer v3.3.9, Copyright (c) 1998-2009, by Zend
>> Technologies
>> with Suhosin v0.9.32.1, Copyright (c) 2007-2010, by SektionEins GmbH
>

I've been experimenting with bare-bones PHP I've built from pristine
sources so far. Don't you think you should do the same, in dealing with
such a bug?

Here's the top portion of my 'php -i' output:

~/proj$ php-5.2.17/sapi/cli/php -i|head -28
phpinfo()
PHP Version => 5.2.17

System => Linux trvuntu 2.6.32-41-generic #88-Ubuntu SMP Thu Mar 29
13:08:43 UTC 2012 i686
Build Date => May 4 2012 20:03:30
Configure Command => './configure' '--disable-all' '--enable-cli'
'--enable-vld'
Server API => Command Line Interface
Virtual Directory Support => disabled
Configuration File (php.ini) Path => /usr/local/lib
Loaded Configuration File => (none)
Scan this dir for additional .ini files => (none)
additional .ini files parsed => (none)
PHP API => 20041225
PHP Extension => 20060613
Zend Extension => 220060519
Debug Build => no
Thread Safety => disabled
Zend Memory Manager => enabled
IPv6 Support => enabled
Registered PHP Streams => php, file, data, http, ftp
Registered Stream Socket Transports => tcp, udp, unix, udg
Registered Stream Filters => string.rot13, string.toupper,
string.tolower, string.strip_tags, convert.*, consumed


This program makes use of the Zend Scripting Language Engine:
Zend Engine v2.2.0, Copyright (c) 1998-2010 Zend Technologies


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On 05/04/2012 11:22 PM, Galen Wright-Watson wrote:
> On Fri, May 4, 2012 at 7:01 AM, C.Koy<[email protected]> wrote:
>
>> On 5/2/2012 10:03 PM, Galen Wright-Watson wrote:
>>
>>> On Wed, May 2, 2012 at 5:23 AM, C.Koy<[email protected]> wrote:
>>>
>>> On 5/1/2012 9:11 PM, Galen Wright-Watson wrote:
>>>> On Thu, Apr 26, 2012 at 3:45 AM, C.Koy<[email protected]> wrote:
>>>>> As of 5.3.0 this bug does not exist for function names. Only classes
>>>>> and
>>>>>
>>>>>> interfaces.
>>>>>>
>>>>>>
>>>>>> Turns out, if you cause a function to be called dynamically by (e.g.)
>>>>>>
>>>>> using
>>>>> a variable function, the bug will surface.
>>>>>
>>>>> <?php
>>>>> setlocale(LC_CTYPE, 'tr_TR');
>>>>> function IJK() {}
>>>>> # succeeds
>>>>> IJK();
>>>>>
>>>>>
>>>> If literal function call precedes the function definition, that would
>>>> fail
>>>> too in 5.2.17, but not in 5.3.0.
>>>> What has changed in this regard 5.2->5.3 ?
>>>>
>>>>
>>>> Do you mean something like the following?
>>> <?php
>>> setlocale(LC_CTYPE, 'tr_TR');
>>> IJK();
>>> setlocale(LC_CTYPE, 'en_US');
>>> function IJK() {echo __FUNCTION__, "\n";}
>>>
>>> I couldn't get it to generate an error under PHP 5.2.17. What am I
>>> missing?
>>>
>>>
>> Try this with 5.2.17:
>>
>>
>> <?php
>> setlocale(LC_CTYPE, 'tr_TR');
>> IJK();
>> function IJK() {}
>>
>>
> That also ran without error for me. I'm not sure how to account for the
> different behavior. Here are the details of the system that I'm using:
>
> $ uname -a
>> Linux n10 3.2.6mtv10 #1 SMP Wed Mar 14 06:22:06 PDT 2012 x86_64 GNU/Linux
>> $ php -v
>> PHP 5.2.17 with Suhosin-Patch 0.9.7 (cli) (built: May 3 2012 12:16:32)
>> Copyright (c) 1997-2009 The PHP Group
>> Zend Engine v2.2.0, Copyright (c) 1998-2010 Zend Technologies
>> with Zend Optimizer v3.3.9, Copyright (c) 1998-2009, by Zend
>> Technologies
>> with Suhosin v0.9.32.1, Copyright (c) 2007-2010, by SektionEins GmbH
Try to var_dump the setLocale and see if it return the specified locale
or just 'false'. If false try the following:

setlocale(LC_ALL, 'tr_TR.UTF-8');

I had the same issue.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On 5/5/2012 7:01 PM, Wim Wisselink wrote:
> Try to var_dump the setLocale and see if it return the specified locale
> or just 'false'.

I thought he was way past that control. Anyway, a simple test should
suffice:

setlocale(LC_CTYPE, 'tr_TR') or exit('setlocale failed\n');







--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On Sat, May 5, 2012 at 5:31 AM, C.Koy <[email protected]> wrote:

>
> I've been experimenting with bare-bones PHP I've built from pristine
> sources so far. Don't you think you should do the same, in dealing with
> such a bug?
>

My personal system is a BSD derivative; the Turkish locales on these use
latin rather than Turkish case conversion (and installing a proper Turkish
locale is a mess), so I've been testing on another system. I've been
hesitant to use its resources too heavily for professional reasons. Running
a small PHP script is one thing; though time and space required for a PHP
build isn't large on modern systems, I can't justify doing so since it's
not directly related to site operations.

On Sat, May 5, 2012 at 8:59 AM, Wim Wisselink <[email protected]> wrote:

> Try to var_dump the setLocale and see if it return the specified locale or
> just 'false'. If false try the following:
>
> setlocale(LC_ALL, 'tr_TR.UTF-8');
>

I had previously tested the locale by using "setlower('I')", as it tests
both that the locale exists and uses Turkish-langage case conversion. The
systems where I tested C.Koy's script passed the "setlower" test. Turned
out to be the Zend optimizer that prevented the error. With it not loaded,
the example script failed with a "Fatal error: Call to undefined function
IJK()" error message.

Here's a breakdown:

In both PHP 5.2 and 5.3, calling a function before defining it results in a
dynamic call (INIT_FCALL_BY_NAME+DO_FCALL_BY_NAME). Here's the PHP 5.2 dump
of C.Koy's example:

line # * op fetch ext return
operands

---------------------------------------------------------------------------------
2 0 > FETCH_CONSTANT ~0
'LC_CTYPE'
1 SEND_VAL
~0
2 SEND_VAL
'tr_TR'
3 DO_FCALL 2
'setlocale'
3 4 INIT_FCALL_BY_NAME
'IJK'
5 DO_FCALL_BY_NAME 0
4 6 NOP
5 7 > RETURN 1
8* > ZEND_HANDLE_EXCEPTION

Here's the 5.3 dump:
line # * op fetch ext return
operands

---------------------------------------------------------------------------------
2 0 > EXT_STMT
1 EXT_FCALL_BEGIN
2 SEND_VAL 2
3 SEND_VAL
'tr_TR'
4 DO_FCALL 2
'setlocale'
5 EXT_FCALL_END
3 6 EXT_STMT
7 INIT_FCALL_BY_NAME
'ijk', 'IJK'
8 EXT_FCALL_BEGIN
9 DO_FCALL_BY_NAME 0
10 EXT_FCALL_END
4 11 EXT_STMT
12 NOP
5 13 > RETURN 1

From line 7 in the 5.3 dump, we see 5.3 converts the function name to
lowercase during compilation, but 5.2 doesn't. Examining the source
confirms this: you can see the lowercase conversion in 5.3's
zend_do_begin_dynamic_function_call on lines 1659 (for namespaced calls)
and 1683 (for non-namespaced calls) of zend_compile.c (
http://svn.php.net/viewvc/php/php-src/branches/PHP_5_3_10/Zend/zend_compile.c?revision=323023&view=markup#l1683),
while there's no such conversion in the same function in 5.2 (
http://svn.php.net/viewvc/php/php-src/branches/PHP_5_2/Zend/zend_compile.c?view=markup&pathrev=302150#l1450
).

5.3 only performs case conversion if the function name is a CONST
expression, which is why defining the function after calling it works but
calling a function with a variable name breaks. Correspondingly, the
ZEND_INIT_FCALL_BY_NAME_SPEC_*_HANDLER (in zend_vm_execute.h) uses the
first operand (which is already lowercased), while the other
INIT_FCALL_BY_NAME opcode handlers (ZEND_INIT_FCALL_BY_NAME_SPEC_*_HANDLER)
use the second, non-lowercased operand.

The 5.2 INIT_FCALL_BY_NAME opcode handlers only ever use the second,
un-lowercased operand.

So, what does this mean for fixing the bug? Not so much when the function
or class is stored in a variable, since these can't be converted to
lowercase at compile time without converting all variables, which is too
wasteful of both time and space (as both the unconverted and converted
strings would need to be stored). For object instantiation,
zend_do_begin_new_object gets the class name ultimately from the
namespace_name rule. zend_do_begin_new_object could then take the resulting
znode and create a second, lowercased copy, storing it as the second
operand. ZEND_NEW_SPEC_HANDLER would then be altered to use the second
operand (if not UNUSED) to instantiate the object. This certainly seems a
valid alternative to a lowercasing version of the namespace_name rule; it's
not as far reaching, which may be good (in that it has less impact) and bad
(in that there may be other instances of this bug that it won't fix).

However, neither the dual-operand solution nor lc_namespace_name will fix
the bug when the identifier is stored in a variable. That requires fixing
the run-time portion of PHP, in particular zend_fetch_class (or
zend_do_begin_class_member_function_call, zend_do_begin_new_object and
likely others) and the INIT_FCALL_BY_NAME handlers.

I get the feeling that there are still other cases yet to be discovered
where this bug surfaces.
Sorry, only registered users may post in this forum.

Click here to login