<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>Re: [PHP-DEV] default charset confusion</title>
        <description>Hi!

&gt; What we really need is what we added in PHP 6. A runtime encoding ini
&gt; setting that is distinct from the output charset which we can use here.
&gt; That would allow people to fix all their legacy code to a specific
&gt; runtime encoding with a single ini setting instead of changing thousands
&gt; of lines of code. I propose that we add such a directive to 5.4.1 to
&gt; ease migration.

One more charset INI setting? I'm not sure I like this. We have tons of 
INIs already, and adding a new one each time we change something makes 
both writing applications and configuring servers harder.
But as the manual says, ISO-8859-1 and  UTF-8  are the same for 
htmlspecialchars() - is it wrong? If yes, what exactly is the different 
between old and new behavior? I tried to read #61354 but could make 
little sense out of it, it lacks expected result and I have hard time 
understanding what is the problem there. Could you explain?

-- 
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php</description>
        <link>http://www.serverphorums.com/read.php?7,460261,460261#msg-460261</link>
        <lastBuildDate>Wed, 22 May 2013 08:52:07 +0200</lastBuildDate>
        <generator>Phorum 5.2.18</generator>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,472338#msg-472338</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,472338#msg-472338</link>
            <description><![CDATA[ Hi Folks:<br />
<br />
This topic appears to have been quietly tabled.  I didn't notice a<br />
decision here or a commit.<br />
<br />
<br />
On Mon, Mar 12, 2012 at 01:12:03PM -0700, Rasmus Lerdorf wrote:<br />
&gt;<br />
&gt; So maybe a way to tackle this is to use the<br />
&gt; mbstring internal encoding when it is set as the htmlspecialchars<br />
&gt; default when it is called without an encoding arg.<br />
<br />
This seems like the clearest indicator of the programmer's intent.<br />
<br />
Thanks,<br />
<br />
--Dan<br />
<br />
-- <br />
 T H E   A N A L Y S I S   A N D   S O L U T I O N S   C O M P A N Y<br />
            data intensive web and database programming<br />
                <a href="http://www.AnalysisAndSolutions.com/" target="_blank"  rel="nofollow">http://www.AnalysisAndSolutions.com/</a><br />
 4015 7th Ave #4, Brooklyn NY 11232  v: 718-854-0335 f: 718-854-0409<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Daniel Convissor</dc:creator>
            <category>php-internals</category>
            <pubDate>Sun, 01 Apr 2012 18:10:03 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,462491#msg-462491</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,462491#msg-462491</link>
            <description><![CDATA[ On 13/03/12 00:25, Stas Malyshev wrote:<br />
&gt; Hi!<br />
&gt;<br />
&gt;&gt; Still, that API is likely wrong: a library function written by someone<br />
&gt;&gt; completely unrelated to the main application shouldn't be echoing<br />
&gt;&gt; anything through the output. And if it's not generating the html, the<br />
&gt;&gt; htmlspecialchars is better done from the return at the calling<br />
&gt;&gt; application (probably after converting the internal charset).<br />
&gt;<br />
&gt; Again, you making a huge amount of assumptions about how ALL the<br />
&gt; applications must work, which means you are wrong in 99.(9)% of cases,<br />
&gt; because there's infinitely many applications which don't work exactly<br />
&gt; like yours does, and we have no idea how they work.<br />
No. I'm saying how I consider they should work, saying that an API doing<br />
otherwise is likely* wrong (aka. has a bad design), very much as I'd<br />
consider insane a company policy stating &quot;PHP function arguments shall<br />
be named $a, $b, $c...&quot;.<br />
That's obviously my opinion, but I think most applications will conform<br />
to that, just as most apps will use more descriptive argument names than<br />
&quot;$c&quot;**.<br />
<br />
<br />
* There might be some very very special application where it turns out<br />
to be an appropiate design, but that would be the exception.<br />
** Even though there are 26!/(26-n)! ways to name so badly the arguments<br />
of a n-ary function.<br />
<br />
<br />
&gt; The main point is that having global state (and yet worse, changeable<br />
&gt; global state) significantly influence how basic functions are working<br />
&gt; is dangerous. It's like keeping everything in globals and instead of<br />
&gt; passing parameters between functions just change some globals and<br />
&gt; expect functions to pick it up.<br />
I agree with you, in the general case. Yet, I consider the html charset<br />
to be a global state. And passing the global variables as parameters on<br />
each function call would be nearly as bad as passing parameters as globals.<br />
I just positioned the opposite way for parse_str(), while being fully<br />
aware of that.<br />
<br />
<br />
&gt;&gt; Such interfaces may be well served by switching the setting many times.<br />
&gt; That's exactly what I am trying to avoid, and you are just<br />
&gt; illustrating why this proposal is dangerous - because that's exactly<br />
&gt; what is going to happen in the code, instead of passing proper<br />
&gt; arguments to htmlspecialchars people will start changing INI settings<br />
&gt; left and right, and then nobody would know what htmlspecialchars()<br />
&gt; call actually does without tracking all the INI changes along the way.<br />
That's assuming people would need to use different output charsets,<br />
which I don't consider to be the case. How many people is using now the<br />
third htmlspecialchars() parameter?<br />
What makes you think that they would need to change the default global,<br />
*several times per request*?<br />
<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Ángel González</dc:creator>
            <category>php-internals</category>
            <pubDate>Thu, 15 Mar 2012 00:40:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,462235#msg-462235</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,462235#msg-462235</link>
            <description><![CDATA[ On Wed, Mar 14, 2012 at 3:37 PM, Gustavo Lopes &lt;glopes@nebm.ist.utl.pt&gt;wrote:<br />
<br />
&gt; On Wed, 14 Mar 2012 14:55:17 +0100, jpauli &lt;jpauli@php.net&gt; wrote:<br />
&gt;<br />
&gt;  I would then propose to make mbstring compile time mandatory.<br />
&gt;&gt;<br />
&gt;&gt;<br />
&gt; I'm completely against these kind of lazy solutions. Yes, let's add strong<br />
&gt; coupling (already starting to smell) to one of the largest extensions and<br />
&gt; make it compile time mandatory because it simplifies the implementation of<br />
&gt; a dubiously useful feature like Zend multibyte. Remember PHP is sometimes<br />
&gt; used in environments with limited memory/disk space.<br />
&gt;<br />
&gt; Also mbstring takes a long time to build (relatively speaking). Just that<br />
&gt; would be a strong argument against making it mandatory, at least for people<br />
&gt; like me that compile PHP with --disable-all very frequently.<br />
&gt;<br />
&gt;<br />
&gt;  I'm against yet another global ini setting, I find the actual ini<br />
&gt;&gt; settings confusing enough to add one more that would moreover reflect<br />
&gt;&gt; mbstring one's (and add more and more confusion).<br />
&gt;&gt; Why not turn ext/mbstring mandatory at compile time, for all future PHP<br />
&gt;&gt; versions, like preg or spl are ?<br />
&gt;&gt;<br />
&gt;&gt; We do need multibyte handling either. ZendEngine takes advantage of<br />
&gt;&gt; mbstring for internal encoding as well, so I probably missed something as<br />
&gt;&gt; why it is still possible to --disable-mbstring (or not add<br />
&gt;&gt; --enable-mbstring) when compiling ? Has it a huge performance impact ?<br />
&gt;&gt;<br />
&gt;&gt;<br />
&gt; mbstring hooks to basically all phases of PHP process/request<br />
&gt; startup/shutdown. Some efforts were made to mitigate the impact of this in<br />
&gt; 5.4 (see e.g. r301068), but at least some impact is inevitable. Of course,<br />
&gt; if you start enabling certain features of mbstring (zend multibyte hooks,<br />
&gt; translation of input variables, function overload) then it starts to be<br />
&gt; significant. However, there are other more compelling reasons not to make<br />
&gt; it required (see above).<br />
&gt;<br />
&gt; --<br />
&gt; Gustavo Lopes<br />
&gt;<br />
<br />
That makes sense to me :-)<br />
<br />
But we should think about complexity in the final choice.<br />
Having something like &quot;internal_encoding&quot; adding in PHP.ini will confuse<br />
people, at least, if we dont clearly explain them what the setting is for.<br />
The name is nearly the same as mbstring's.<br />
<br />
I recently opened a doc bug about multibyte handling in 5.4 (#61373) , as<br />
the documentation is really light on that point<br />
<br />
Julien.P]]></description>
            <dc:creator>jpauli</dc:creator>
            <category>php-internals</category>
            <pubDate>Wed, 14 Mar 2012 16:00:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,462224#msg-462224</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,462224#msg-462224</link>
            <description><![CDATA[ On Wed, 14 Mar 2012 14:55:17 +0100, jpauli &lt;jpauli@php.net&gt; wrote:<br />
<br />
&gt; I would then propose to make mbstring compile time mandatory.<br />
&gt;<br />
<br />
I'm completely against these kind of lazy solutions. Yes, let's add strong  <br />
coupling (already starting to smell) to one of the largest extensions and  <br />
make it compile time mandatory because it simplifies the implementation of  <br />
a dubiously useful feature like Zend multibyte. Remember PHP is sometimes  <br />
used in environments with limited memory/disk space.<br />
<br />
Also mbstring takes a long time to build (relatively speaking). Just that  <br />
would be a strong argument against making it mandatory, at least for  <br />
people like me that compile PHP with --disable-all very frequently.<br />
<br />
&gt; I'm against yet another global ini setting, I find the actual ini  <br />
&gt; settings confusing enough to add one more that would moreover reflect  <br />
&gt; mbstring one's (and add more and more confusion).<br />
&gt; Why not turn ext/mbstring mandatory at compile time, for all future PHP<br />
&gt; versions, like preg or spl are ?<br />
&gt;<br />
&gt; We do need multibyte handling either. ZendEngine takes advantage of<br />
&gt; mbstring for internal encoding as well, so I probably missed something as<br />
&gt; why it is still possible to --disable-mbstring (or not add<br />
&gt; --enable-mbstring) when compiling ? Has it a huge performance impact ?<br />
&gt;<br />
<br />
mbstring hooks to basically all phases of PHP process/request  <br />
startup/shutdown. Some efforts were made to mitigate the impact of this in  <br />
5.4 (see e.g. r301068), but at least some impact is inevitable. Of course,  <br />
if you start enabling certain features of mbstring (zend multibyte hooks,  <br />
translation of input variables, function overload) then it starts to be  <br />
significant. However, there are other more compelling reasons not to make  <br />
it required (see above).<br />
<br />
-- <br />
Gustavo Lopes<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Gustavo Lopes</dc:creator>
            <category>php-internals</category>
            <pubDate>Wed, 14 Mar 2012 15:40:01 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,462223#msg-462223</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,462223#msg-462223</link>
            <description><![CDATA[ On Wed, Mar 14, 2012 at 3:29 PM, Michael Stowe &lt;me@mikestowe.com&gt; wrote:<br />
<br />
&gt; Correct me if I'm wrong, but I believe Zend Multibyte is now enabled by<br />
&gt; default in PHP 5.4.<br />
&gt;<br />
&gt; - Mike<br />
&gt;<br />
&gt;<br />
<a href="http://lxr.php.net/opengrok/xref/PHP_5_4/UPGRADING#91" target="_blank"  rel="nofollow">http://lxr.php.net/opengrok/xref/PHP_5_4/UPGRADING#91</a><br />
<a href="http://lxr.php.net/opengrok/xref/PHP_5_4/Zend/zend.c#108" target="_blank"  rel="nofollow">http://lxr.php.net/opengrok/xref/PHP_5_4/Zend/zend.c#108</a><br />
<a href="http://lxr.php.net/opengrok/xref/PHP_5_4/php.ini-development#358" target="_blank"  rel="nofollow">http://lxr.php.net/opengrok/xref/PHP_5_4/php.ini-development#358</a><br />
<a href="http://lxr.php.net/opengrok/xref/PHP_5_4/php.ini-production#358" target="_blank"  rel="nofollow">http://lxr.php.net/opengrok/xref/PHP_5_4/php.ini-production#358</a><br />
<br />
we just moved the switch from compilation time to runtime, so the code is<br />
there, if you want to enable it, you don't have to recompile php but only<br />
have to change an ini setting, but it isn't turned on by default.<br />
AFAIK<br />
-- <br />
Ferenc Kovács<br />
@Tyr43l - <a href="http://tyrael.hu" target="_blank"  rel="nofollow">http://tyrael.hu</a>]]></description>
            <dc:creator>Ferenc Kovacs</dc:creator>
            <category>php-internals</category>
            <pubDate>Wed, 14 Mar 2012 15:40:01 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,462222#msg-462222</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,462222#msg-462222</link>
            <description><![CDATA[ Correct me if I'm wrong, but I believe Zend Multibyte is now enabled by<br />
default in PHP 5.4.<br />
<br />
- Mike<br />
<br />
<br />
<br />
<br />
On Wed, Mar 14, 2012 at 9:24 AM, Ferenc Kovacs &lt;tyra3l@gmail.com&gt; wrote:<br />
<br />
&gt; &gt;<br />
&gt; &gt;<br />
&gt; &gt; I would then propose to make mbstring compile time mandatory.<br />
&gt; &gt;<br />
&gt; &gt; I'm against yet another global ini setting, I find the actual ini<br />
&gt; settings<br />
&gt; &gt; confusing enough to add one more that would moreover reflect mbstring<br />
&gt; one's<br />
&gt; &gt; (and add more and more confusion).<br />
&gt; &gt; Why not turn ext/mbstring mandatory at compile time, for all future PHP<br />
&gt; &gt; versions, like preg or spl are ?<br />
&gt; &gt;<br />
&gt; &gt; We do need multibyte handling either. ZendEngine takes advantage of<br />
&gt; &gt; mbstring for internal encoding as well, so I probably missed something as<br />
&gt; &gt; why it is still possible to --disable-mbstring (or not add<br />
&gt; &gt; --enable-mbstring) when compiling ? Has it a huge performance impact ?<br />
&gt; &gt;<br />
&gt; &gt; Thank you :)<br />
&gt; &gt;<br />
&gt; &gt; Julien.P<br />
&gt; &gt;<br />
&gt;<br />
&gt; see<br />
&gt; <a href="http://www.mail-archive.com/internals@lists.php.net/msg48452.html" target="_blank"  rel="nofollow">http://www.mail-archive.com/internals@lists.php.net/msg48452.html</a><br />
&gt; <a href="http://lxr.php.net/opengrok/xref/PHP_5_4/UPGRADING#91" target="_blank"  rel="nofollow">http://lxr.php.net/opengrok/xref/PHP_5_4/UPGRADING#91</a><br />
&gt; and<br />
&gt; <a href="http://www.mail-archive.com/internals@lists.php.net/msg53863.html" target="_blank"  rel="nofollow">http://www.mail-archive.com/internals@lists.php.net/msg53863.html</a><br />
&gt;<br />
&gt; basically the mbstring code in the ZE is only used if you<br />
&gt; enable zend.multibyte, which is disabled by default, so it isn't mandatory<br />
&gt; to have ext/mbstring for the default build/setup.<br />
&gt; as you can see from the last link, I would support having ext/mbstring<br />
&gt; builtin and always enabled, but I would like to hear from more people about<br />
&gt; the pros and cons.<br />
&gt;<br />
&gt; --<br />
&gt; Ferenc Kovács<br />
&gt; @Tyr43l - <a href="http://tyrael.hu" target="_blank"  rel="nofollow">http://tyrael.hu</a><br />
&gt;<br />
<br />
<br />
<br />
-- <br />
-----------------------<br />
<br />
&quot;My command is this: Love each other as I<br />
have loved you.&quot;                         John 15:12<br />
<br />
-----------------------]]></description>
            <dc:creator>Michael Stowe</dc:creator>
            <category>php-internals</category>
            <pubDate>Wed, 14 Mar 2012 15:40:01 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,462218#msg-462218</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,462218#msg-462218</link>
            <description><![CDATA[ &gt;<br />
&gt;<br />
&gt; I would then propose to make mbstring compile time mandatory.<br />
&gt;<br />
&gt; I'm against yet another global ini setting, I find the actual ini settings<br />
&gt; confusing enough to add one more that would moreover reflect mbstring one's<br />
&gt; (and add more and more confusion).<br />
&gt; Why not turn ext/mbstring mandatory at compile time, for all future PHP<br />
&gt; versions, like preg or spl are ?<br />
&gt;<br />
&gt; We do need multibyte handling either. ZendEngine takes advantage of<br />
&gt; mbstring for internal encoding as well, so I probably missed something as<br />
&gt; why it is still possible to --disable-mbstring (or not add<br />
&gt; --enable-mbstring) when compiling ? Has it a huge performance impact ?<br />
&gt;<br />
&gt; Thank you :)<br />
&gt;<br />
&gt; Julien.P<br />
&gt;<br />
<br />
see<br />
<a href="http://www.mail-archive.com/internals@lists.php.net/msg48452.html" target="_blank"  rel="nofollow">http://www.mail-archive.com/internals@lists.php.net/msg48452.html</a><br />
<a href="http://lxr.php.net/opengrok/xref/PHP_5_4/UPGRADING#91" target="_blank"  rel="nofollow">http://lxr.php.net/opengrok/xref/PHP_5_4/UPGRADING#91</a><br />
and<br />
<a href="http://www.mail-archive.com/internals@lists.php.net/msg53863.html" target="_blank"  rel="nofollow">http://www.mail-archive.com/internals@lists.php.net/msg53863.html</a><br />
<br />
basically the mbstring code in the ZE is only used if you<br />
enable zend.multibyte, which is disabled by default, so it isn't mandatory<br />
to have ext/mbstring for the default build/setup.<br />
as you can see from the last link, I would support having ext/mbstring<br />
builtin and always enabled, but I would like to hear from more people about<br />
the pros and cons.<br />
<br />
-- <br />
Ferenc Kovács<br />
@Tyr43l - <a href="http://tyrael.hu" target="_blank"  rel="nofollow">http://tyrael.hu</a>]]></description>
            <dc:creator>Ferenc Kovacs</dc:creator>
            <category>php-internals</category>
            <pubDate>Wed, 14 Mar 2012 15:30:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,462199#msg-462199</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,462199#msg-462199</link>
            <description><![CDATA[ On Tue, Mar 13, 2012 at 1:52 AM, Yasuo Ohgaki &lt;yohgaki@ohgaki.net&gt; wrote:<br />
<br />
&gt; 2012/3/13 Rasmus Lerdorf &lt;rasmus@lerdorf.com&gt;:<br />
&gt; &gt; On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote:<br />
&gt; &gt;&gt; I thought default_charset became UTF-8, so I was expecting<br />
&gt; &gt;&gt; following HTTP header.<br />
&gt; &gt;&gt;<br />
&gt; &gt;&gt; content-type  text/html; charset=UTF-8<br />
&gt; &gt;&gt;<br />
&gt; &gt;&gt; However, I got empty charset (missing 'charset=UTF-8').<br />
&gt; &gt;&gt; So I looked up to source and found the line in SAPI.h<br />
&gt; &gt;&gt;<br />
&gt; &gt;&gt; 293   #define SAPI_DEFAULT_CHARSET        &quot;&quot;<br />
&gt; &gt;&gt;<br />
&gt; &gt;&gt; Empty string should be &quot;UTF-8&quot;, isn't it?<br />
&gt; &gt;<br />
&gt; &gt; No, we can't force an output charset on people since it would end up<br />
&gt; &gt; breaking a lot of sites.<br />
&gt;<br />
&gt; Right, so may be for the next major release? 5.5.0?<br />
&gt;<br />
&gt; As the first XSS advisory in 2000 states, explicitly setting char coding<br />
&gt; will<br />
&gt; prevent certain XSS. Recent browsers have much better encoding handing,<br />
&gt; but setting encoding explicitly is better for security still.<br />
&gt;<br />
&gt; &gt; PHP 5.3's determine_charset behaves exactly like 5.4's. In 5.3 we have:<br />
&gt; &gt;<br />
&gt; &gt;    if (charset_hint == NULL)<br />
&gt; &gt;                return cs_8859_1;<br />
&gt; &gt;<br />
&gt; &gt; and in 5.4 we have:<br />
&gt; &gt;<br />
&gt; &gt;    if (charset_hint == NULL)<br />
&gt; &gt;                return cs_utf_8;<br />
&gt; &gt;<br />
&gt; &gt; So there is no difference in their guessing when there is no hint, the<br />
&gt; &gt; only difference is that in 5.4 we choose utf8 and in 5.3 we choose<br />
&gt; &gt; 8859-1 in that case.<br />
&gt;<br />
&gt; I got this with 5.3<br />
&gt; &lt;?php<br />
&gt; echo htmlentities('&lt;日本語UTF-8&gt;',ENT_QUOTES);<br />
&gt; echo htmlentities('&lt;日本語UTF-8&gt;',ENT_QUOTES, 'UTF-8');<br />
&gt;<br />
&gt; &amp;lt;&amp;aelig;�&amp;yen;&amp;aelig;�&amp;not;&amp;egrave;&amp;ordf;�UTF8<br />
&gt; &amp;gt;&amp;lt;日本語UTF-8&amp;gt;<br />
&gt;<br />
&gt; So people migrating from 5.3 to 5.4 should not have problems.<br />
&gt; Migration older than 5.3 to 5.4 will be problematic.<br />
&gt;<br />
&gt; I always set all parameters for htmlentities/htmlspecialchars, therefore<br />
&gt; I haven't noticed this was changed from 5.3. They may be migrating from<br />
&gt; 5.2 or older. (RHEL5 uses 5.1)<br />
&gt;<br />
&gt; Since PHP does not have default multibyte module, it may be good for having<br />
&gt;<br />
&gt; input_encoding<br />
&gt; internal_encoding<br />
&gt; output_encoding<br />
&gt;<br />
&gt;<br />
I would then propose to make mbstring compile time mandatory.<br />
<br />
I'm against yet another global ini setting, I find the actual ini settings<br />
confusing enough to add one more that would moreover reflect mbstring one's<br />
(and add more and more confusion).<br />
Why not turn ext/mbstring mandatory at compile time, for all future PHP<br />
versions, like preg or spl are ?<br />
<br />
We do need multibyte handling either. ZendEngine takes advantage of<br />
mbstring for internal encoding as well, so I probably missed something as<br />
why it is still possible to --disable-mbstring (or not add<br />
--enable-mbstring) when compiling ? Has it a huge performance impact ?<br />
<br />
Thank you :)<br />
<br />
Julien.P]]></description>
            <dc:creator>jpauli</dc:creator>
            <category>php-internals</category>
            <pubDate>Wed, 14 Mar 2012 15:00:03 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,461653#msg-461653</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,461653#msg-461653</link>
            <description><![CDATA[ 2012.03.13 16:38 Richard Lynch rašė:<br />
&gt; I'd have to agree with Stas that everybody should start passing in a<br />
&gt; variable there, that can be set somewhere in a config, or, perhaps,<br />
&gt; would DEFAULT to, errrr...<br />
<br />
You do realize that suggestions on this thread and original bug reporter<br />
failed to make correct decisions about values that should be used to<br />
migrate original function to PHP 5.4 compatible syntax?<br />
<br />
htmlspecialchars without arguments does not default to ENT_QUOTES or NULL.<br />
<br />
Failure to choose proper second argument value will lead to different<br />
exploit or data corruption.<br />
<br />
&gt; You can't default to a function call.<br />
<br />
Changing default in function was bad idea.<br />
<br />
Ignoring bug reports about f....ed up documentation and closing them with<br />
bogus explanations might not be bad idea, but it really helps in<br />
alienating your developer base.<br />
<br />
-- <br />
Tomas<br />
<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Tomas Kuliavas</dc:creator>
            <category>php-internals</category>
            <pubDate>Tue, 13 Mar 2012 18:50:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,461541#msg-461541</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,461541#msg-461541</link>
            <description><![CDATA[ On Mon, March 12, 2012 2:44 pm, Rasmus Lerdorf wrote:<br />
&gt; But you can't necessarily hardcode the encoding if you are writing<br />
&gt; portable code. That's a bit like hardcoding a timezone. In order to<br />
&gt; write portable code you need to give people the ability to localize<br />
&gt; it.<br />
<br />
If you wanted it portable, wouldn't you need to have a variable there,<br />
so it can survive the ISO-8859-1 to UTF-8 change, and to allow people<br />
to change it despite whatever non-standard setting might happen to be<br />
in somebody else's php.ini?<br />
<br />
I mean, sure, it's nice if it &quot;just works&quot; for the folks who want to<br />
install and have it localized for their own charset hard-coded in<br />
php.ini, but if it's being multi-national website, you have to pass in<br />
a variable there, which seems the more portable option to this naive<br />
reader.<br />
<br />
Having it default to whatever happens to be in php.ini only solves the<br />
use case of people who only want to serve up their content in their<br />
own charset.<br />
<br />
I'd have to agree with Stas that everybody should start passing in a<br />
variable there, that can be set somewhere in a config, or, perhaps,<br />
would DEFAULT to, errrr...<br />
<br />
You can't default to a function call.<br />
<br />
ANOTHER magic constant like __INI_CHARSET__ ???<br />
<br />
That's probably a bad idea...<br />
<br />
-- <br />
brain cancer update:<br />
<a href="http://richardlynch.blogspot.com/search/label/brain%20tumor" target="_blank"  rel="nofollow">http://richardlynch.blogspot.com/search/label/brain%20tumor</a><br />
Donate:<br />
<a href="https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&amp;hosted_button_id=FS9NLTNEEKWBE" target="_blank"  rel="nofollow">https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&amp;hosted_button_id=FS9NLTNEEKWBE</a><br />
<br />
<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Richard Lynch</dc:creator>
            <category>php-internals</category>
            <pubDate>Tue, 13 Mar 2012 15:40:04 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,461530#msg-461530</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,461530#msg-461530</link>
            <description><![CDATA[ Am 13.03.2012, 02:34 Uhr, schrieb Rasmus Lerdorf &lt;rasmus@lerdorf.com&gt;:<br />
&gt; On 03/12/2012 05:52 PM, Yasuo Ohgaki wrote:<br />
&gt;&gt; I always set all parameters for htmlentities/htmlspecialchars, therefore<br />
&gt;&gt; I haven't noticed this was changed from 5.3. They may be migrating from<br />
&gt;&gt; 5.2 or older. (RHEL5 uses 5.1)<br />
&gt;<br />
&gt; No, like I showed, moving from 5.3 to 5.4 breaks because the new default<br />
&gt; UTF-8 encoding validates the input and 8859-1 in 5.3 does not. So for<br />
&gt; charsets that are actually safe for the low-ascii chars that are<br />
&gt; significant to html htmlspecialchars() now returns false in 5.4 because<br />
&gt; their chars fail the UTF8 validity check. For people who explicitly set<br />
&gt; all the parameters nothing has changed, of course.<br />
<br />
I second that. It causes us big PITA because we're still using 8859-1  <br />
(shame<br />
on us) and it is made even worse because the encoding parameter is after  <br />
the<br />
(optional) flags parameter which now has to be given too.<br />
<br />
The sane version from my naive point of view would be to honor  <br />
default_charset<br />
if nothing is given. That's what I expected when I read the migration  <br />
guide.<br />
<br />
- Chris<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Christian Schneider</dc:creator>
            <category>php-internals</category>
            <pubDate>Tue, 13 Mar 2012 15:30:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,461087#msg-461087</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,461087#msg-461087</link>
            <description><![CDATA[ On 03/12/2012 05:52 PM, Yasuo Ohgaki wrote:<br />
&gt; I always set all parameters for htmlentities/htmlspecialchars, therefore<br />
&gt; I haven't noticed this was changed from 5.3. They may be migrating from<br />
&gt; 5.2 or older. (RHEL5 uses 5.1)<br />
<br />
No, like I showed, moving from 5.3 to 5.4 breaks because the new default<br />
UTF-8 encoding validates the input and 8859-1 in 5.3 does not. So for<br />
charsets that are actually safe for the low-ascii chars that are<br />
significant to html htmlspecialchars() now returns false in 5.4 because<br />
their chars fail the UTF8 validity check. For people who explicitly set<br />
all the parameters nothing has changed, of course.<br />
<br />
-Rasmus<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Rasmus Lerdorf</dc:creator>
            <category>php-internals</category>
            <pubDate>Tue, 13 Mar 2012 02:40:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460993#msg-460993</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460993#msg-460993</link>
            <description><![CDATA[ 2012/3/13 Rasmus Lerdorf &lt;rasmus@lerdorf.com&gt;:<br />
&gt; On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote:<br />
&gt;&gt; I thought default_charset became UTF-8, so I was expecting<br />
&gt;&gt; following HTTP header.<br />
&gt;&gt;<br />
&gt;&gt; content-type  text/html; charset=UTF-8<br />
&gt;&gt;<br />
&gt;&gt; However, I got empty charset (missing 'charset=UTF-8').<br />
&gt;&gt; So I looked up to source and found the line in SAPI.h<br />
&gt;&gt;<br />
&gt;&gt; 293   #define SAPI_DEFAULT_CHARSET        &quot;&quot;<br />
&gt;&gt;<br />
&gt;&gt; Empty string should be &quot;UTF-8&quot;, isn't it?<br />
&gt;<br />
&gt; No, we can't force an output charset on people since it would end up<br />
&gt; breaking a lot of sites.<br />
<br />
Right, so may be for the next major release? 5.5.0?<br />
<br />
As the first XSS advisory in 2000 states, explicitly setting char coding will<br />
prevent certain XSS. Recent browsers have much better encoding handing,<br />
but setting encoding explicitly is better for security still.<br />
<br />
&gt; PHP 5.3's determine_charset behaves exactly like 5.4's. In 5.3 we have:<br />
&gt;<br />
&gt;    if (charset_hint == NULL)<br />
&gt;                return cs_8859_1;<br />
&gt;<br />
&gt; and in 5.4 we have:<br />
&gt;<br />
&gt;    if (charset_hint == NULL)<br />
&gt;                return cs_utf_8;<br />
&gt;<br />
&gt; So there is no difference in their guessing when there is no hint, the<br />
&gt; only difference is that in 5.4 we choose utf8 and in 5.3 we choose<br />
&gt; 8859-1 in that case.<br />
<br />
I got this with 5.3<br />
&lt;?php<br />
echo htmlentities('&lt;日本語UTF-8&gt;',ENT_QUOTES);<br />
echo htmlentities('&lt;日本語UTF-8&gt;',ENT_QUOTES, 'UTF-8');<br />
<br />
&amp;lt;&amp;aelig;�&amp;yen;&amp;aelig;�&amp;not;&amp;egrave;&amp;ordf;�UTF8<br />
&amp;gt;&amp;lt;日本語UTF-8&amp;gt;<br />
<br />
So people migrating from 5.3 to 5.4 should not have problems.<br />
Migration older than 5.3 to 5.4 will be problematic.<br />
<br />
I always set all parameters for htmlentities/htmlspecialchars, therefore<br />
I haven't noticed this was changed from 5.3. They may be migrating from<br />
5.2 or older. (RHEL5 uses 5.1)<br />
<br />
Since PHP does not have default multibyte module, it may be good for having<br />
<br />
input_encoding<br />
internal_encoding<br />
output_encoding<br />
<br />
php.ini settings and make multibyte modules use them when they are set.<br />
Or just make mbstring default, alternatively.<br />
<br />
Rather big change for released version, but this is simple easy change.<br />
<br />
Regards,<br />
<br />
--<br />
Yasuo Ohgaki<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Yasuo Ohgaki</dc:creator>
            <category>php-internals</category>
            <pubDate>Tue, 13 Mar 2012 02:00:01 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460806#msg-460806</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460806#msg-460806</link>
            <description><![CDATA[ Hi!<br />
<br />
&gt; Still, that API is likely wrong: a library function written by someone<br />
&gt; completely unrelated to the main application shouldn't be echoing<br />
&gt; anything through the output. And if it's not generating the html, the<br />
&gt; htmlspecialchars is better done from the return at the calling<br />
&gt; application (probably after converting the internal charset).<br />
<br />
Again, you making a huge amount of assumptions about how ALL the <br />
applications must work, which means you are wrong in 99.(9)% of cases, <br />
because there's infinitely many applications which don't work exactly <br />
like yours does, and we have no idea how they work.<br />
<br />
The main point is that having global state (and yet worse, changeable <br />
global state) significantly influence how basic functions are working is <br />
dangerous. It's like keeping everything in globals and instead of <br />
passing parameters between functions just change some globals and expect <br />
functions to pick it up.<br />
<br />
&gt; Such interfaces may be well served by switching the setting many times.<br />
<br />
That's exactly what I am trying to avoid, and you are just illustrating <br />
why this proposal is dangerous - because that's exactly what is going to <br />
happen in the code, instead of passing proper arguments to <br />
htmlspecialchars people will start changing INI settings left and right, <br />
and then nobody would know what htmlspecialchars() call actually does <br />
without tracking all the INI changes along the way.<br />
-- <br />
Stanislav Malyshev, Software Architect<br />
SugarCRM: <a href="http://www.sugarcrm.com/" target="_blank"  rel="nofollow">http://www.sugarcrm.com/</a><br />
(408)454-6900 ext. 227<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Stas Malyshev</dc:creator>
            <category>php-internals</category>
            <pubDate>Tue, 13 Mar 2012 00:30:01 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460728#msg-460728</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460728#msg-460728</link>
            <description><![CDATA[ Hi!<br />
<br />
<br />
&gt; If you are a framework developer, and really want to shield against a<br />
&gt; bad php.ini setting, you could ini_set() to your prefered charset at the<br />
&gt; beginning of the request.<br />
<br />
That assuming &quot;the request&quot; is completely processed by your framework <br />
and you never call any outside code and any outside code never calls you <br />
- otherwise your messing with INI setting may very well break that code <br />
or that code's messing with INI settings may very well break yours.<br />
-- <br />
Stanislav Malyshev, Software Architect<br />
SugarCRM: <a href="http://www.sugarcrm.com/" target="_blank"  rel="nofollow">http://www.sugarcrm.com/</a><br />
(408)454-6900 ext. 227<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Stas Malyshev</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 22:40:03 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460710#msg-460710</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460710#msg-460710</link>
            <description><![CDATA[ On 12/03/12 20:51, Stas Malyshev wrote:<br />
&gt; Hi!<br />
&gt;<br />
&gt;&gt; But you can't necessarily hardcode the encoding if you are writing<br />
&gt;&gt; portable code. That's a bit like hardcoding a timezone. In order to<br />
&gt;&gt; write portable code you need to give people the ability to localize it.<br />
&gt;<br />
&gt; No, it's not like timezone at all. I have to support all timezones in<br />
&gt; a global app, but I don't have to internally support every encoding on<br />
&gt; Earth - having everything internally in UTF-8 works quite well, and a<br />
&gt; lot of applications do exactly that - they have everything internally<br />
&gt; in UTF-8 and only may convert when importing or exporting the data. I<br />
&gt; don't see anything in using UTF-8 throughout the app/library that<br />
&gt; makes it non-portable. However, if we allow to change defaults in<br />
&gt; htmlspecialchars() etc. that essentially makes having defaults useless<br />
&gt; as I'd have so explicitly specify UTF-8 each time - otherwise it's a<br />
&gt; gamble what encoding I'd actually get.<br />
If you are a framework developer, and really want to shield against a<br />
bad php.ini setting, you could ini_set() to your prefered charset at the<br />
beginning of the request.<br />
<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Ángel González</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 22:30:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460684#msg-460684</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460684#msg-460684</link>
            <description><![CDATA[ hi Rasmus,<br />
<br />
On Mon, Mar 12, 2012 at 9:12 PM, Rasmus Lerdorf &lt;rasmus@lerdorf.com&gt; wrote:<br />
<br />
&gt; If everything was UTF-8 we wouldn't have any of these issues.<br />
&gt; Unfortunately that isn't the case. The question is what to do with apps<br />
&gt; that need to deal with non UTF-8 data. Are we going to provide any help<br />
&gt; to them beyond just telling them to convert everything to UTF-8?<br />
<br />
That's not really an acceptable solution, obviously.<br />
<br />
&gt; We took steps in 5.4 to improve htmlspecialchars to understand more<br />
&gt; encodings and we have the concept of script_encoding and<br />
&gt; internal_encoding that is used both in the engine and in mbstring.<br />
&gt;<br />
&gt; Currently internal_encoding isn't checked by htmlspecialchars. If you<br />
&gt; pass it '' it checks script_encoding and default_charset which is a bit<br />
&gt; odd since neither directly relate to the encoding of the internal data<br />
&gt; you are feeding to it. So maybe a way to tackle this is to use the<br />
&gt; mbstring internal encoding when it is set as the htmlspecialchars<br />
&gt; default when it is called without an encoding arg.<br />
<br />
That's why I would prefer to use an existing setting and clearly<br />
document it instead of creating a new ini settings with a totally<br />
different impact than the existing ones. Not sure which one would fit<br />
best tho'.<br />
<br />
Reading these last two paragraphs gave me a headache and I did not<br />
know anymore which encoding we were talking about ;-)<br />
<br />
Cheers,<br />
-- <br />
Pierre<br />
<br />
@pierrejoye | <a href="http://blog.thepimp.net" target="_blank"  rel="nofollow">http://blog.thepimp.net</a> | <a href="http://www.libgd.org" target="_blank"  rel="nofollow">http://www.libgd.org</a><br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Pierre Joye</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 21:30:01 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460679#msg-460679</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460679#msg-460679</link>
            <description><![CDATA[ On 03/12/2012 12:51 PM, Stas Malyshev wrote:<br />
&gt; Hi!<br />
&gt; <br />
&gt;&gt; But you can't necessarily hardcode the encoding if you are writing<br />
&gt;&gt; portable code. That's a bit like hardcoding a timezone. In order to<br />
&gt;&gt; write portable code you need to give people the ability to localize it.<br />
&gt; <br />
&gt; No, it's not like timezone at all. I have to support all timezones in a<br />
&gt; global app, but I don't have to internally support every encoding on<br />
&gt; Earth - having everything internally in UTF-8 works quite well, and a<br />
&gt; lot of applications do exactly that - they have everything internally in<br />
&gt; UTF-8 and only may convert when importing or exporting the data. I don't<br />
&gt; see anything in using UTF-8 throughout the app/library that makes it<br />
&gt; non-portable. However, if we allow to change defaults in<br />
&gt; htmlspecialchars() etc. that essentially makes having defaults useless<br />
&gt; as I'd have so explicitly specify UTF-8 each time - otherwise it's a<br />
&gt; gamble what encoding I'd actually get.<br />
<br />
If everything was UTF-8 we wouldn't have any of these issues.<br />
Unfortunately that isn't the case. The question is what to do with apps<br />
that need to deal with non UTF-8 data. Are we going to provide any help<br />
to them beyond just telling them to convert everything to UTF-8?<br />
<br />
We took steps in 5.4 to improve htmlspecialchars to understand more<br />
encodings and we have the concept of script_encoding and<br />
internal_encoding that is used both in the engine and in mbstring.<br />
Currently internal_encoding isn't checked by htmlspecialchars. If you<br />
pass it '' it checks script_encoding and default_charset which is a bit<br />
odd since neither directly relate to the encoding of the internal data<br />
you are feeding to it. So maybe a way to tackle this is to use the<br />
mbstring internal encoding when it is set as the htmlspecialchars<br />
default when it is called without an encoding arg.<br />
<br />
-Rasmus<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Rasmus Lerdorf</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 21:20:03 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460663#msg-460663</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460663#msg-460663</link>
            <description><![CDATA[ Hi!<br />
<br />
&gt; But you can't necessarily hardcode the encoding if you are writing<br />
&gt; portable code. That's a bit like hardcoding a timezone. In order to<br />
&gt; write portable code you need to give people the ability to localize it.<br />
<br />
No, it's not like timezone at all. I have to support all timezones in a <br />
global app, but I don't have to internally support every encoding on <br />
Earth - having everything internally in UTF-8 works quite well, and a <br />
lot of applications do exactly that - they have everything internally in <br />
UTF-8 and only may convert when importing or exporting the data. I don't <br />
see anything in using UTF-8 throughout the app/library that makes it <br />
non-portable. However, if we allow to change defaults in <br />
htmlspecialchars() etc. that essentially makes having defaults useless <br />
as I'd have so explicitly specify UTF-8 each time - otherwise it's a <br />
gamble what encoding I'd actually get.<br />
-- <br />
Stanislav Malyshev, Software Architect<br />
SugarCRM: <a href="http://www.sugarcrm.com/" target="_blank"  rel="nofollow">http://www.sugarcrm.com/</a><br />
(408)454-6900 ext. 227<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Stas Malyshev</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 21:00:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460652#msg-460652</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460652#msg-460652</link>
            <description><![CDATA[ On 03/12/2012 12:40 PM, Stas Malyshev wrote:<br />
&gt; Hi!<br />
&gt; <br />
&gt;&gt; And yes, it may very well be dangerous to use the wrong charset and now<br />
&gt;&gt; that we have better support for GB2312 and other asian charsets in the<br />
&gt;&gt; entities functions in 5.4 it is even more prudent to choose the right<br />
&gt;&gt; one so we should provide some way to help people get it right short of<br />
&gt;&gt; changing every call.<br />
&gt; <br />
&gt; I'm not sure &quot;changing every call&quot; is such a big problem - it's one grep<br />
&gt; and one replace, can be done in one line of sed/awk/perl/php probably.<br />
&gt; But a bigger issue is here that people insist on using wrong charsets<br />
&gt; and expect language to have some magical external defaults that work for<br />
&gt; exactly their use case, instead of doing what they should be doing all<br />
&gt; along - putting charset right there in the argument.<br />
&gt; We need to get people off this mindset fast, since it is not a good one.<br />
&gt; Having tons of hidden defaults that modify behavior of functions called<br />
&gt; with the same arguments in hundreds of different ways is a coding and<br />
&gt; maintenance nightmare. Now if I write htmlspecialchars() I can never be<br />
&gt; sure if works right and uses UTF-8 - what if somebody messed with the<br />
&gt; INI setting because of some other broken library that required that to<br />
&gt; work?<br />
<br />
But you can't necessarily hardcode the encoding if you are writing<br />
portable code. That's a bit like hardcoding a timezone. In order to<br />
write portable code you need to give people the ability to localize it.<br />
<br />
-Rasmus<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Rasmus Lerdorf</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 20:50:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460650#msg-460650</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460650#msg-460650</link>
            <description><![CDATA[ Hi!<br />
<br />
&gt; And yes, it may very well be dangerous to use the wrong charset and now<br />
&gt; that we have better support for GB2312 and other asian charsets in the<br />
&gt; entities functions in 5.4 it is even more prudent to choose the right<br />
&gt; one so we should provide some way to help people get it right short of<br />
&gt; changing every call.<br />
<br />
I'm not sure &quot;changing every call&quot; is such a big problem - it's one grep <br />
and one replace, can be done in one line of sed/awk/perl/php probably. <br />
But a bigger issue is here that people insist on using wrong charsets <br />
and expect language to have some magical external defaults that work for <br />
exactly their use case, instead of doing what they should be doing all <br />
along - putting charset right there in the argument.<br />
We need to get people off this mindset fast, since it is not a good one. <br />
Having tons of hidden defaults that modify behavior of functions called <br />
with the same arguments in hundreds of different ways is a coding and <br />
maintenance nightmare. Now if I write htmlspecialchars() I can never be <br />
sure if works right and uses UTF-8 - what if somebody messed with the <br />
INI setting because of some other broken library that required that to work?<br />
-- <br />
Stanislav Malyshev, Software Architect<br />
SugarCRM: <a href="http://www.sugarcrm.com/" target="_blank"  rel="nofollow">http://www.sugarcrm.com/</a><br />
(408)454-6900 ext. 227<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Stas Malyshev</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 20:50:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460497#msg-460497</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460497#msg-460497</link>
            <description><![CDATA[ I think the ini directive, while adding another to the list, may be the<br />
most unobtrusive method to address this issue, at least for developers.<br />
<br />
I definitely agree with Rasmus that this could be one of the bigger<br />
headaches in transitioning to 5.4 (for non-UTF8 sites) and unless we can<br />
come up with a better solution, I say let's move forward with it for 5.4.1.<br />
<br />
- Mike<br />
<br />
<br />
<br />
<br />
<br />
On Mon, Mar 12, 2012 at 10:27 AM, Rasmus Lerdorf &lt;rasmus@lerdorf.com&gt; wrote:<br />
<br />
&gt; On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote:<br />
&gt; &gt; Hi<br />
&gt; &gt;<br />
&gt; &gt; I think following PHP 5.4.0 NEWS entry is misleading.<br />
&gt; &gt;<br />
&gt; &gt;   . Changed default value of &quot;default_charset&quot; php.ini option from<br />
&gt; ISO-8859-1 to<br />
&gt; &gt;     UTF-8. (Rasmus)<br />
&gt;<br />
&gt; Yes, I have fixed that now.<br />
&gt;<br />
&gt; &gt; I thought default_charset became UTF-8, so I was expecting<br />
&gt; &gt; following HTTP header.<br />
&gt; &gt;<br />
&gt; &gt; content-type  text/html; charset=UTF-8<br />
&gt; &gt;<br />
&gt; &gt; However, I got empty charset (missing 'charset=UTF-8').<br />
&gt; &gt; So I looked up to source and found the line in SAPI.h<br />
&gt; &gt;<br />
&gt; &gt; 293   #define SAPI_DEFAULT_CHARSET        &quot;&quot;<br />
&gt; &gt;<br />
&gt; &gt; Empty string should be &quot;UTF-8&quot;, isn't it?<br />
&gt;<br />
&gt; No, we can't force an output charset on people since it would end up<br />
&gt; breaking a lot of sites.<br />
&gt;<br />
&gt; &gt;  - php.ini's default_charset should be UTF-8.<br />
&gt; &gt;  - determine_charset() should not blindly default to UTF-8 when there<br />
&gt; &gt; are no hint.<br />
&gt; &gt;<br />
&gt; &gt; Old htmlentities/htmlspecialchars actually determines charset from<br />
&gt; &gt; default_charset/mbstring.internal_encoding/etc. I think old behavior<br />
&gt; &gt; is better than now.<br />
&gt; &gt;<br />
&gt; &gt; How about make determine_charset() behaves like 5.3 and set the<br />
&gt; &gt; SAPI_DEFAULT_CHARSET to &quot;UTF-8&quot;?<br />
&gt;<br />
&gt; PHP 5.3's determine_charset behaves exactly like 5.4's. In 5.3 we have:<br />
&gt;<br />
&gt;    if (charset_hint == NULL)<br />
&gt;                return cs_8859_1;<br />
&gt;<br />
&gt; and in 5.4 we have:<br />
&gt;<br />
&gt;    if (charset_hint == NULL)<br />
&gt;                return cs_utf_8;<br />
&gt;<br />
&gt; So there is no difference in their guessing when there is no hint, the<br />
&gt; only difference is that in 5.4 we choose utf8 and in 5.3 we choose<br />
&gt; 8859-1 in that case.<br />
&gt;<br />
&gt; -Rasmus<br />
&gt;<br />
&gt; --<br />
&gt; PHP Internals - PHP Runtime Development Mailing List<br />
&gt; To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a><br />
&gt;<br />
&gt;<br />
<br />
<br />
-- <br />
-----------------------<br />
<br />
&quot;My command is this: Love each other as I<br />
have loved you.&quot;                         John 15:12<br />
<br />
-----------------------]]></description>
            <dc:creator>Michael Stowe</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 16:50:25 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460491#msg-460491</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460491#msg-460491</link>
            <description><![CDATA[ On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote:<br />
&gt; Hi<br />
&gt; <br />
&gt; I think following PHP 5.4.0 NEWS entry is misleading.<br />
&gt; <br />
&gt;   . Changed default value of &quot;default_charset&quot; php.ini option from ISO-8859-1 to<br />
&gt;     UTF-8. (Rasmus)<br />
<br />
Yes, I have fixed that now.<br />
<br />
&gt; I thought default_charset became UTF-8, so I was expecting<br />
&gt; following HTTP header.<br />
&gt; <br />
&gt; content-type	text/html; charset=UTF-8<br />
&gt; <br />
&gt; However, I got empty charset (missing 'charset=UTF-8').<br />
&gt; So I looked up to source and found the line in SAPI.h<br />
&gt; <br />
&gt; 293	#define SAPI_DEFAULT_CHARSET        &quot;&quot;<br />
&gt; <br />
&gt; Empty string should be &quot;UTF-8&quot;, isn't it?<br />
<br />
No, we can't force an output charset on people since it would end up<br />
breaking a lot of sites.<br />
<br />
&gt;  - php.ini's default_charset should be UTF-8.<br />
&gt;  - determine_charset() should not blindly default to UTF-8 when there<br />
&gt; are no hint.<br />
&gt; <br />
&gt; Old htmlentities/htmlspecialchars actually determines charset from<br />
&gt; default_charset/mbstring.internal_encoding/etc. I think old behavior<br />
&gt; is better than now.<br />
&gt; <br />
&gt; How about make determine_charset() behaves like 5.3 and set the<br />
&gt; SAPI_DEFAULT_CHARSET to &quot;UTF-8&quot;?<br />
<br />
PHP 5.3's determine_charset behaves exactly like 5.4's. In 5.3 we have:<br />
<br />
    if (charset_hint == NULL)<br />
	        return cs_8859_1;<br />
<br />
and in 5.4 we have:<br />
<br />
    if (charset_hint == NULL)<br />
	        return cs_utf_8;<br />
<br />
So there is no difference in their guessing when there is no hint, the<br />
only difference is that in 5.4 we choose utf8 and in 5.3 we choose<br />
8859-1 in that case.<br />
<br />
-Rasmus<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Rasmus Lerdorf</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 16:30:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460382#msg-460382</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460382#msg-460382</link>
            <description><![CDATA[ On Mon, Mar 12, 2012 at 6:21 PM, Yasuo Ohgaki &lt;yohgaki@ohgaki.net&gt; wrote:<br />
&gt; Hi,<br />
&gt;<br />
&gt; I think motivation of<br />
&gt;<br />
&gt;       /* Default is now UTF-8 */<br />
&gt;       if (charset_hint == NULL)<br />
&gt;               return cs_utf_8;<br />
&gt;<br />
&gt; is for better performance and I think it's good for better performance.<br />
&gt; Alternative of my suggestion is introduce new php.ini entry as Rusmus<br />
&gt; mentioned.<br />
&gt;<br />
&gt; The name may be &quot;default_html_escape_encoding&quot;?<br />
Hi:<br />
   in consideration of succinctness,  I think run_time_encoding is better.<br />
<br />
   and we should also separate the determine_output_charset and<br />
determine_run_time_charset(there is only one determin_charset now)<br />
<br />
thanks<br />
&gt;<br />
&gt; We should document this behavior very well, since it affects all of<br />
&gt; non UTF-8 web sites.<br />
&gt;<br />
&gt; Regards,<br />
&gt;<br />
&gt; --<br />
&gt; Yasuo Ohgaki<br />
&gt; <a href="mailto:&#121;&#111;&#104;&#103;&#97;&#107;&#105;&#64;&#111;&#104;&#103;&#97;&#107;&#105;&#46;&#110;&#101;&#116;">&#121;&#111;&#104;&#103;&#97;&#107;&#105;&#64;&#111;&#104;&#103;&#97;&#107;&#105;&#46;&#110;&#101;&#116;</a><br />
&gt;<br />
&gt; --<br />
&gt; PHP Internals - PHP Runtime Development Mailing List<br />
&gt; To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a><br />
&gt;<br />
<br />
<br />
<br />
-- <br />
Laruence  Xinchen Hui<br />
<a href="http://www.laruence.com/" target="_blank"  rel="nofollow">http://www.laruence.com/</a><br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Laruence</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 13:50:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460334#msg-460334</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460334#msg-460334</link>
            <description><![CDATA[ Hi,<br />
<br />
I think motivation of<br />
<br />
       /* Default is now UTF-8 */<br />
       if (charset_hint == NULL)<br />
               return cs_utf_8;<br />
<br />
is for better performance and I think it's good for better performance.<br />
Alternative of my suggestion is introduce new php.ini entry as Rusmus<br />
mentioned.<br />
<br />
The name may be &quot;default_html_escape_encoding&quot;?<br />
<br />
We should document this behavior very well, since it affects all of<br />
non UTF-8 web sites.<br />
<br />
Regards,<br />
<br />
--<br />
Yasuo Ohgaki<br />
<a href="mailto:&#121;&#111;&#104;&#103;&#97;&#107;&#105;&#64;&#111;&#104;&#103;&#97;&#107;&#105;&#46;&#110;&#101;&#116;">&#121;&#111;&#104;&#103;&#97;&#107;&#105;&#64;&#111;&#104;&#103;&#97;&#107;&#105;&#46;&#110;&#101;&#116;</a><br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Yasuo Ohgaki</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 11:30:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460324#msg-460324</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460324#msg-460324</link>
            <description><![CDATA[ Hi<br />
<br />
I think following PHP 5.4.0 NEWS entry is misleading.<br />
<br />
  . Changed default value of &quot;default_charset&quot; php.ini option from ISO-8859-1 to<br />
    UTF-8. (Rasmus)<br />
<br />
I thought default_charset became UTF-8, so I was expecting<br />
following HTTP header.<br />
<br />
content-type	text/html; charset=UTF-8<br />
<br />
However, I got empty charset (missing 'charset=UTF-8').<br />
So I looked up to source and found the line in SAPI.h<br />
<br />
293	#define SAPI_DEFAULT_CHARSET        &quot;&quot;<br />
<br />
Empty string should be &quot;UTF-8&quot;, isn't it?<br />
<br />
BTW, empty charset in HTTP header does not mean the default will<br />
be ISO-8859-1, but it let browser guess the encoding is used.<br />
Guessing encoding may cause XSS under certain conditions.<br />
<br />
<br />
Anyway, I was curious so I've checked ext/standard/html.c and found<br />
<br />
/* {{{ entity_charset determine_charset<br />
 * returns the charset identifier based on current locale or a hint.<br />
 * defaults to UTF-8 */<br />
static enum entity_charset determine_charset(char *charset_hint TSRMLS_DC)<br />
{<br />
	int i;<br />
	enum entity_charset charset = cs_utf_8;<br />
	int len = 0;<br />
	const zend_encoding *zenc;<br />
<br />
	/* Default is now UTF-8 */<br />
	if (charset_hint == NULL)<br />
		return cs_utf_8;<br />
<br />
<br />
There are 2 problems.<br />
<br />
 - php.ini's default_charset should be UTF-8.<br />
 - determine_charset() should not blindly default to UTF-8 when there<br />
are no hint.<br />
<br />
Old htmlentities/htmlspecialchars actually determines charset from<br />
default_charset/mbstring.internal_encoding/etc. I think old behavior<br />
is better than now.<br />
<br />
How about make determine_charset() behaves like 5.3 and set the<br />
SAPI_DEFAULT_CHARSET to &quot;UTF-8&quot;?<br />
<br />
Then PHP will behave like as NEWS mentions, htmlentities/htmlspecialchars<br />
default encoding became 'UTF-8' and users will have control for default<br />
htmlenties/htmlspecialchars encoding.<br />
<br />
Regards,<br />
<br />
--<br />
Yasuo Ohgaki<br />
<a href="mailto:&#121;&#111;&#104;&#103;&#97;&#107;&#105;&#64;&#111;&#104;&#103;&#97;&#107;&#105;&#46;&#110;&#101;&#116;">&#121;&#111;&#104;&#103;&#97;&#107;&#105;&#64;&#111;&#104;&#103;&#97;&#107;&#105;&#46;&#110;&#101;&#116;</a><br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Yasuo Ohgaki</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 11:10:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460282#msg-460282</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460282#msg-460282</link>
            <description><![CDATA[ On 03/12/2012 12:52 AM, Stas Malyshev wrote:<br />
&gt; Hi!<br />
&gt; <br />
&gt;&gt; Ignoring 5.4 for a second, if you in 5.3 do this:<br />
&gt;&gt;<br />
&gt;&gt; echo htmlspecialchars($string);<br />
&gt;&gt; echo htmlspecialchars($string, NULL, &quot;ISO-8859-1&quot;);<br />
&gt;&gt; echo htmlspecialchars($string, NULL, &quot;UTF-8&quot;);<br />
&gt;&gt;<br />
&gt;&gt; You will see that the first two output the escaped string with the<br />
&gt;&gt; GB2312 bytes intact within it and the UTF-8 calls returns false because<br />
&gt;&gt; it correctly recognizes that GB2312 is not UTF-8. We don't have any such<br />
&gt;&gt; check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for<br />
&gt;&gt; htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4.<br />
&gt; <br />
&gt; So the difference is that ISO8859-1 does not validate but UTF-8 validates?<br />
&gt; I'm not sure what GB2312 encoding does but isn't it dangerous to do<br />
&gt; htmlspecialchars() with wrong encoding? Wouldn't htmlentities() also<br />
&gt; produce wrong result when used with wrong encoding?<br />
<br />
Not sure you can validate 8859-1 since it isn't multibyte, can you? Is<br />
there any byte that is explicitly forbidden in 8859-1?<br />
<br />
And yes, it may very well be dangerous to use the wrong charset and now<br />
that we have better support for GB2312 and other asian charsets in the<br />
entities functions in 5.4 it is even more prudent to choose the right<br />
one so we should provide some way to help people get it right short of<br />
changing every call.<br />
<br />
Gustavo suggested we could use the multibyte encoding setting.<br />
Unfortunately only zend.script_encoding is available and I think<br />
internal_encoding is closer to what we need here, but that is only<br />
available as mbstring.internal_encoding.<br />
<br />
-Rasmus<br />
<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Rasmus Lerdorf</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 09:20:01 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460280#msg-460280</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460280#msg-460280</link>
            <description><![CDATA[ On Mon, Mar 12, 2012 at 3:52 AM, Stas Malyshev &lt;smalyshev@sugarcrm.com&gt;wrote:<br />
<br />
&gt; Hi!<br />
&gt;<br />
&gt;<br />
&gt;  Ignoring 5.4 for a second, if you in 5.3 do this:<br />
&gt;&gt;<br />
&gt;&gt; echo htmlspecialchars($string);<br />
&gt;&gt; echo htmlspecialchars($string, NULL, &quot;ISO-8859-1&quot;);<br />
&gt;&gt; echo htmlspecialchars($string, NULL, &quot;UTF-8&quot;);<br />
&gt;&gt;<br />
&gt;&gt; You will see that the first two output the escaped string with the<br />
&gt;&gt; GB2312 bytes intact within it and the UTF-8 calls returns false because<br />
&gt;&gt; it correctly recognizes that GB2312 is not UTF-8. We don't have any such<br />
&gt;&gt; check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for<br />
&gt;&gt; htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4.<br />
&gt;&gt;<br />
&gt;<br />
&gt; So the difference is that ISO8859-1 does not validate but UTF-8 validates?<br />
&gt; I'm not sure what GB2312 encoding does but isn't it dangerous to do<br />
&gt; htmlspecialchars() with wrong encoding? Wouldn't htmlentities() also<br />
&gt; produce wrong result when used with wrong encoding?<br />
<br />
<br />
The EUC-CN encoding appears to ensure compatibility with ascii by avoiding<br />
the ascii range for each of its two bytes, so it seems that<br />
htmlspecialchars should work OK:<br />
<br />
<a href="http://en.wikipedia.org/wiki/GB_2312#EUC-CN" target="_blank"  rel="nofollow">http://en.wikipedia.org/wiki/GB_2312#EUC-CN</a><br />
<a href="http://php.net/manual/en/mbstring.supported-encodings.php" target="_blank"  rel="nofollow">http://php.net/manual/en/mbstring.supported-encodings.php</a><br />
<br />
Adam<br />
<br />
Adam]]></description>
            <dc:creator>Adam Jon Richardson</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 09:11:03 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460275#msg-460275</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460275#msg-460275</link>
            <description><![CDATA[ Hi!<br />
<br />
&gt; Ignoring 5.4 for a second, if you in 5.3 do this:<br />
&gt;<br />
&gt; echo htmlspecialchars($string);<br />
&gt; echo htmlspecialchars($string, NULL, &quot;ISO-8859-1&quot;);<br />
&gt; echo htmlspecialchars($string, NULL, &quot;UTF-8&quot;);<br />
&gt;<br />
&gt; You will see that the first two output the escaped string with the<br />
&gt; GB2312 bytes intact within it and the UTF-8 calls returns false because<br />
&gt; it correctly recognizes that GB2312 is not UTF-8. We don't have any such<br />
&gt; check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for<br />
&gt; htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4.<br />
<br />
So the difference is that ISO8859-1 does not validate but UTF-8 validates?<br />
I'm not sure what GB2312 encoding does but isn't it dangerous to do <br />
htmlspecialchars() with wrong encoding? Wouldn't htmlentities() also <br />
produce wrong result when used with wrong encoding?<br />
<br />
-- <br />
Stanislav Malyshev, Software Architect<br />
SugarCRM: <a href="http://www.sugarcrm.com/" target="_blank"  rel="nofollow">http://www.sugarcrm.com/</a><br />
(408)454-6900 ext. 227<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Stas Malyshev</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 09:00:02 +0100</pubDate>
        </item>
        <item>
            <guid>http://www.serverphorums.com/read.php?7,460261,460271#msg-460271</guid>
            <title>Re: [PHP-DEV] default charset confusion</title>
            <link>http://www.serverphorums.com/read.php?7,460261,460271#msg-460271</link>
            <description><![CDATA[ On 03/12/2012 12:41 AM, Rasmus Lerdorf wrote:<br />
<br />
&gt; $string = $string = &quot;&lt;pre&gt;&lt;p&gt;$gb2312&lt;/p&gt;&lt;/pre&gt;&quot;;<br />
<br />
Sorry typo there obviously. Just one $string<br />
<br />
-Rasmus<br />
<br />
<br />
-- <br />
PHP Internals - PHP Runtime Development Mailing List<br />
To unsubscribe, visit: <a href="http://www.php.net/unsub.php" target="_blank"  rel="nofollow">http://www.php.net/unsub.php</a>]]></description>
            <dc:creator>Rasmus Lerdorf</dc:creator>
            <category>php-internals</category>
            <pubDate>Mon, 12 Mar 2012 08:50:03 +0100</pubDate>
        </item>
    </channel>
</rss>
