Welcome! Log In Create A New Profile

Advanced

[PHP-DEV] Consider only ignoring newlines for final ?> in a file

Posted by Andrea Faulds 
Andrea Faulds
[PHP-DEV] Consider only ignoring newlines for final ?> in a file
September 07, 2017 03:50AM
Hi everyone,

This is the tiniest of issues, but it's bugged me for a long time and
makes the HTML produced by PHP code less readable than it out to be.
Specifically, PHP ignores a newline immediately following a ?> tag. The
reason for this is, from what I recall, to prevent issues where
whitespace at the end of a PHP file is echoed before headers can be
sent. On UNIX in particular, all text files (should) end in a newline,
so this is a reasonable and necessary feature.

However, for ?> tags anywhere that aren't right at the end of the file,
this is just a nuisance that makes for messy output. For example, HTML
output that should look like:

<table>
<tr>
<td>foo</td>
<td>bar</td>
</tr>
</table>

May instead end up looking something like:

<table> <tr>
<td>foo</td> <td>bar</td>
</tr></table>

Of course, HTML doesn't matter so much, it'll render the same to the
end-user. However, for outputting e.g. plain text, newlines can be
significant, and so you have to insert an ugly and surprising extra
newline following a tag.

Would anyone object to me changing how PHP handles this so that only the
final ?> tag consumes its following newline, and only at the end of the
file?

Thanks!
--
Andrea Faulds
https://ajf.me/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
> On Sep 6, 2017, at 18:45, Andrea Faulds <[email protected]> wrote:
> Would anyone object to me changing how PHP handles this so that only the final ?> tag consumes its following newline, and only at the end of the file?
>

I object. It's a change in ancient behavior that has the potential to break existing code for superficial reasons.

We'd never design it that way today, but that die is long cast.

-1

-Sara
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
François Laupretre
Re: [PHP-DEV] Consider only ignoring newlines for final ?> in a file
September 07, 2017 12:20PM
Hi Andrea,


Le 07/09/2017 à 03:45, Andrea Faulds a écrit :
> Hi everyone,
>
> This is the tiniest of issues, but it's bugged me for a long time and
> makes the HTML produced by PHP code less readable than it out to be.
> Specifically, PHP ignores a newline immediately following a ?> tag.
> The reason for this is, from what I recall, to prevent issues where
> whitespace at the end of a PHP file is echoed before headers can be
> sent. On UNIX in particular, all text files (should) end in a newline,
> so this is a reasonable and necessary feature.
>
> However, for ?> tags anywhere that aren't right at the end of the
> file, this is just a nuisance that makes for messy output. For
> example, HTML output that should look like:
>
> <table>
>     <tr>
>        <td>foo</td>
>        <td>bar</td>
>     </tr>
> </table>
>
> May instead end up looking something like:
>
> <table>    <tr>
>        <td>foo</td>       <td>bar</td>
>     </tr></table>
>
> Of course, HTML doesn't matter so much, it'll render the same to the
> end-user. However, for outputting e.g. plain text, newlines can be
> significant, and so you have to insert an ugly and surprising extra
> newline following a tag.
>
> Would anyone object to me changing how PHP handles this so that only
> the final ?> tag consumes its following newline, and only at the end
> of the file?
>
> Thanks!

+1 to create a PHP8 branch and change the behavior there. not in PHP7.

Once again, some may think it's too early but, IMO, we should create
such a branch and encourage RFCs and changes targeting next major
version to be announced, discussed, implemented, and tested as soon as
possible. This is the only way to introduce BC breaks while minimizing
their impact. We saw this when talking about PHP7 features : when
proposed too late, changes introducing BC breaks generally must be
rejected, whatever their value.

Regards

François




--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On Thu, Sep 7, 2017 at 3:45 AM, Andrea Faulds <[email protected]> wrote:

> Hi everyone,
>
> This is the tiniest of issues, but it's bugged me for a long time and
> makes the HTML produced by PHP code less readable than it out to be.
> Specifically, PHP ignores a newline immediately following a ?> tag. The
> reason for this is, from what I recall, to prevent issues where whitespace
> at the end of a PHP file is echoed before headers can be sent. On UNIX in
> particular, all text files (should) end in a newline, so this is a
> reasonable and necessary feature.
>
> However, for ?> tags anywhere that aren't right at the end of the file,
> this is just a nuisance that makes for messy output. For example, HTML
> output that should look like:
>
> <table>
> <tr>
> <td>foo</td>
> <td>bar</td>
> </tr>
> </table>
>
> May instead end up looking something like:
>
> <table> <tr>
> <td>foo</td> <td>bar</td>
> </tr></table>
>
> Of course, HTML doesn't matter so much, it'll render the same to the
> end-user. However, for outputting e.g. plain text, newlines can be
> significant, and so you have to insert an ugly and surprising extra newline
> following a tag.
>
> Would anyone object to me changing how PHP handles this so that only the
> final ?> tag consumes its following newline, and only at the end of the
> file?
>
> Thanks!
>

It also goes the other way. Whether you want to drop the newline after ?>
depends (roughly) on whether the code is control flow (drop) or trailing
output (don't drop). If the newline is not dropped anymore it doesn't mean
that the output will look nice, it's just going to be broken in a different
way.

Nikita
On Thu, Sep 7, 2017 at 12:11 PM, François Laupretre <[email protected]>
wrote:

> Hi Andrea,
>
>
> Le 07/09/2017 à 03:45, Andrea Faulds a écrit :
>
>> Hi everyone,
>>
>> This is the tiniest of issues, but it's bugged me for a long time and
>> makes the HTML produced by PHP code less readable than it out to be.
>> Specifically, PHP ignores a newline immediately following a ?> tag. The
>> reason for this is, from what I recall, to prevent issues where whitespace
>> at the end of a PHP file is echoed before headers can be sent. On UNIX in
>> particular, all text files (should) end in a newline, so this is a
>> reasonable and necessary feature.
>>
>> However, for ?> tags anywhere that aren't right at the end of the file,
>> this is just a nuisance that makes for messy output. For example, HTML
>> output that should look like:
>>
>> <table>
>> <tr>
>> <td>foo</td>
>> <td>bar</td>
>> </tr>
>> </table>
>>
>> May instead end up looking something like:
>>
>> <table> <tr>
>> <td>foo</td> <td>bar</td>
>> </tr></table>
>>
>> Of course, HTML doesn't matter so much, it'll render the same to the
>> end-user. However, for outputting e.g. plain text, newlines can be
>> significant, and so you have to insert an ugly and surprising extra newline
>> following a tag.
>>
>> Would anyone object to me changing how PHP handles this so that only the
>> final ?> tag consumes its following newline, and only at the end of the
>> file?
>>
>> Thanks!
>>
>
> +1 to create a PHP8 branch and change the behavior there. not in PHP7.
>
> Once again, some may think it's too early but, IMO, we should create such
> a branch and encourage RFCs and changes targeting next major version to be
> announced, discussed, implemented, and tested as soon as possible. This is
> the only way to introduce BC breaks while minimizing their impact. We saw
> this when talking about PHP7 features : when proposed too late, changes
> introducing BC breaks generally must be rejected, whatever their value.
>
> Regards
>
> François
>

New branches cause a lot of additional overhead for core developers.
Changes have to merged across all actively supported branches, commonly
with NEWS file adjustments. Depending on where we are in the release cycle
right now, we already have 3-4 active branches -- we don't need to add to
that.

I think it's fine to start targeting PHP 8 now with RFCs, but
implementation work should be done outside of php-src. It is more cost
effective for one person to rebase their code two years down the line than
it is for everybody to do extra work every time they commit something.
(Alternatively we would have to change our development model so that
branches are not synchronized at all times.)

Nikita
Hi Nikita,

Nikita Popov wrote:
>
> It also goes the other way. Whether you want to drop the newline after ?>
> depends (roughly) on whether the code is control flow (drop) or trailing
> output (don't drop). If the newline is not dropped anymore it doesn't mean
> that the output will look nice, it's just going to be broken in a different
> way.
>

I understand that it should be dropped for “control flow” code (maybe
not the best term, I misunderstood what you meant at first). That's why
I suggest ignoring the following newline only for the ?> at the end of
the file, because I can't think of another place where you would have a
?> and *not* intend output immediately after it.

So I'm not sure I understand your objection, from that standpoint. Did I
miss something?

Regards.
--
Andrea Faulds
https://ajf.me/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On Thu, Sep 7, 2017 at 2:43 PM, Andrea Faulds <[email protected]> wrote:

> Hi Nikita,
>
> Nikita Popov wrote:
>
>>
>> It also goes the other way. Whether you want to drop the newline after ?>
>> depends (roughly) on whether the code is control flow (drop) or trailing
>> output (don't drop). If the newline is not dropped anymore it doesn't mean
>> that the output will look nice, it's just going to be broken in a
>> different
>> way.
>>
>>
> I understand that it should be dropped for “control flow” code (maybe not
> the best term, I misunderstood what you meant at first). That's why I
> suggest ignoring the following newline only for the ?> at the end of the
> file, because I can't think of another place where you would have a ?> and
> *not* intend output immediately after it.
>
> So I'm not sure I understand your objection, from that standpoint. Did I
> miss something?
>
> Regards.
>

I'm referring to code like

<ul>
<?php foreach ($data as $value): ?>
<li><?= $value ?></li>
<?php endforeach; ?>
</ul>

Currently this would produce the output

<ul>
<li>Foo</li>
<li>Bar</li>
</ul>

Without the trailing newline elision it would produce

<ul>

<li>Foo</li>
<li>Bar</li>

</ul>

I always assumed that this is the reason why we do this in the first place.

Nikita
Hi,

Nikita Popov wrote:
> On Thu, Sep 7, 2017 at 2:43 PM, Andrea Faulds <[email protected]> wrote:
>
>> Hi Nikita,
>>
>> Nikita Popov wrote:
>>
>>>
>>> It also goes the other way. Whether you want to drop the newline after ?>
>>> depends (roughly) on whether the code is control flow (drop) or trailing
>>> output (don't drop). If the newline is not dropped anymore it doesn't mean
>>> that the output will look nice, it's just going to be broken in a
>>> different
>>> way.
>>>
>>>
>> I understand that it should be dropped for “control flow” code (maybe not
>> the best term, I misunderstood what you meant at first). That's why I
>> suggest ignoring the following newline only for the ?> at the end of the
>> file, because I can't think of another place where you would have a ?> and
>> *not* intend output immediately after it.
>>
>> So I'm not sure I understand your objection, from that standpoint. Did I
>> miss something?
>>
>> Regards.
>>
>
> I'm referring to code like
>
> <ul>
> <?php foreach ($data as $value): ?>
> <li><?= $value ?></li>
> <?php endforeach; ?>
> </ul>
>
> Currently this would produce the output
>
> <ul>
> <li>Foo</li>
> <li>Bar</li>
> </ul>
>
> Without the trailing newline elision it would produce
>
> <ul>
>
> <li>Foo</li>
> <li>Bar</li>
>
> </ul>
>
> I always assumed that this is the reason why we do this in the first place.

Ah. See, it's actually that kind of code that is my problem. A practical
example would be:

<table>
<?php foreach($rows as $row): ?>
<tr>
<?php foreach ($row as $column): ?>
<td><?=htmlspecialchars($column)?></td>
<?php endforeach; ?>
</tr>
<?php endforeach; ?>
</table>

which currently produces:

<table>
<tr>
<td>foo</td>
<td>bar</td>
</tr>
<tr>
<td>baz</td>
<td>qux</td>
</tr>
</table>

The doubled-up indentation from missing newlines makes it into a mess.
And this is even worse in practice when you have more nested control
flow. Extra newlines would be fine here, but missing newlines aren't.

Thanks.

--
Andrea Faulds
https://ajf.me/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Christoph M. Becker
Re: [PHP-DEV] Consider only ignoring newlines for final ?> in a file
September 07, 2017 04:10PM
On 07.09.2017 at 15:43, Andrea Faulds wrote:

> Ah. See, it's actually that kind of code that is my problem. A practical
> example would be:
>
> <table>
>     <?php foreach($rows as $row): ?>
>         <tr>
>             <?php foreach ($row as $column): ?>
>                 <td><?=htmlspecialchars($column)?></td>
>             <?php endforeach; ?>
>         </tr>
>     <?php endforeach; ?>
> </table>

I start the "control flow lines" always on column 0 (similar to C
preprocessor instructions), what gives the desired output and is quite
readable:

<table>
<?php foreach($rows as $row): ?>
<tr>
<?php foreach ($row as $column): ?>
<td><?=htmlspecialchars($column)?></td>
<?php endforeach; ?>
</tr>
<?php endforeach; ?>
</table>

--
Christoph M. Becker

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
>
>
> Would anyone object to me changing how PHP handles this so that only the
> final ?> tag consumes its following newline, and only at the end of the
> file?
>
>
Captain Obvious here. It has long been the policy of many large PHP
projects to not close the last PHP tag for this reason. This change
wouldn't affect them. It risks affecting projects without this policy, and
those tend to be older and often private.
Hi,

Christoph M. Becker wrote:
> On 07.09.2017 at 15:43, Andrea Faulds wrote:
>
>> Ah. See, it's actually that kind of code that is my problem. A practical
>> example would be:
>>
>> <table>
>> <?php foreach($rows as $row): ?>
>> <tr>
>> <?php foreach ($row as $column): ?>
>> <td><?=htmlspecialchars($column)?></td>
>> <?php endforeach; ?>
>> </tr>
>> <?php endforeach; ?>
>> </table>
>
> I start the "control flow lines" always on column 0 (similar to C
> preprocessor instructions), what gives the desired output and is quite
> readable:
>
> <table>
> <?php foreach($rows as $row): ?>
> <tr>
> <?php foreach ($row as $column): ?>
> <td><?=htmlspecialchars($column)?></td>
> <?php endforeach; ?>
> </tr>
> <?php endforeach; ?>
> </table>

This seems like a reasonable workaround, thank you for the idea. It
reminds me of what PHP's source code does with preprocessor instructions:

#ifndef FOO
# define FOO
#endif

I might do this in future code.

That said, I still think the ?> newline behaviour should be looked at,
since this kind of workaround isn't universally applicable (and in any
case isn't to everyone's tastes). In particular, if you want to generate
plain text and need to insert a newline, having PHP throw them away and
requiring you to add extra ones to compensate makes for uglier source
code which is harder to reason about.

Thanks!
--
Andrea Faulds
https://ajf.me/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Hi,

Michael Morris wrote:
>>
>>
>> Would anyone object to me changing how PHP handles this so that only the
>> final ?> tag consumes its following newline, and only at the end of the
>> file?
>>
>>
> Captain Obvious here. It has long been the policy of many large PHP
> projects to not close the last PHP tag for this reason. This change
> wouldn't affect them. It risks affecting projects without this policy, and
> those tend to be older and often private.
>

The idea here though is not to affect code where the entire file is a
<?php ?> block. If newlines are still consumed, but only for ?> at the
end of the file, those files should still behave the same.

What I want to change is how it behaves in other circumstances, i.e.
templating.

Thanks.
--
Andrea Faulds

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Christoph M. Becker
Re: [PHP-DEV] Consider only ignoring newlines for final ?> in a file
September 07, 2017 04:50PM
On 07.09.2017 at 16:21, Andrea Faulds wrote:

> This seems like a reasonable workaround, thank you for the idea. It
> reminds me of what PHP's source code does with preprocessor instructions:
>
> #ifndef FOO
> #    define FOO
> #endif

Hence the name PHP. :)

> That said, I still think the ?> newline behaviour should be looked at,
> since this kind of workaround isn't universally applicable (and in any
> case isn't to everyone's tastes). In particular, if you want to generate
> plain text and need to insert a newline, having PHP throw them away and
> requiring you to add extra ones to compensate makes for uglier source
> code which is harder to reason about.

If you don't mind a trailing space (I don't like them, but well), you
can write:

<?='foo'?>
bar

And of course, there are template engines which could be used as well.
Frankly, I don't see any need for action here. :)

--
Christoph M. Becker

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On 9/6/2017 6:45 PM, Andrea Faulds wrote:
> Hi everyone,
>
> This is the tiniest of issues, but it's bugged me for a long time and
> makes the HTML produced by PHP code less readable than it out to be.
> Specifically, PHP ignores a newline immediately following a ?> tag. The
> reason for this is, from what I recall, to prevent issues where
> whitespace at the end of a PHP file is echoed before headers can be
> sent. On UNIX in particular, all text files (should) end in a newline,
> so this is a reasonable and necessary feature.
>
> However, for ?> tags anywhere that aren't right at the end of the file,
> this is just a nuisance that makes for messy output. For example, HTML
> output that should look like:
>
> <table>
>     <tr>
>        <td>foo</td>
>        <td>bar</td>
>     </tr>
> </table>
>
> May instead end up looking something like:
>
> <table>    <tr>
>        <td>foo</td>       <td>bar</td>
>     </tr></table>
>
> Of course, HTML doesn't matter so much, it'll render the same to the
> end-user. However, for outputting e.g. plain text, newlines can be
> significant, and so you have to insert an ugly and surprising extra
> newline following a tag.
>
> Would anyone object to me changing how PHP handles this so that only the
> final ?> tag consumes its following newline, and only at the end of the
> file?
>
> Thanks!

I've noticed that over the years. When I care, I'll either press enter
an extra time or, more frequently, switch over to using pure echo
statements for precise output control. I don't think of this as a
particularly significant issue.*

Alternatively, for the HTML case, it is possible to stream an output
buffer and manipulate newlines through the TagFilterStream class:

https://github.com/cubiclesoft/ultimate-web-scraper

That particular class can process HTML at a rate of up to 1MB/sec even
when using callbacks via its very efficient stream-based state engine.
The extra overhead is minimal for prettifying HTML output.


* I'd personally rather see a suitable fix for Bug #73535 at this point.
It's been an open issue with a CVE assigned for almost 10 months. It
would be nice to see it triaged properly (e.g. the suggested fix
applied) so that I can finally close that browser tab. If you have the
spare time for newline output adjustments, I'd love to see that extra
energy sunk into fixing existing security vulnerabilities, especially
those with CVEs and suggested solutions. Just sayin'. But you guys do
whatever you want to do.

--
Thomas Hruska
CubicleSoft President

I've got great, time saving software that you will find useful.

http://cubiclesoft.com/

And once you find my software useful:

http://cubiclesoft.com/donate/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On Thu, Sep 7, 2017 at 10:23 AM, Andrea Faulds <[email protected]> wrote:

> What I want to change is how it behaves in other circumstances, i.e.
> templating.
>
> Thanks.
>
>
I get that, but I can think of one example where this innocent change might
BC break something. You cite this change being for templating - this
implies the php files with this feature are being loaded by another php
file with require() or include(). Suppose someone creates a template
wrapper with this circumstance in mind. Instead of doing the obvious, omit
the final ?> tag in the template, they write code in the template wrapper
to snip the last endline character from the included file. Depending on how
their code is written your change could now become a breaking change: for
example they just lop off the last character of the template's return
without checking to see if it is indeed a newline character.
On 7 September 2017 16:34:38 BST, Michael Morris <[email protected]> wrote:
> Suppose someone creates a template
>wrapper with this circumstance in mind. Instead of doing the obvious,
>omit
>the final ?> tag in the template, they write code in the template
>wrapper
>to snip the last endline character from the included file. Depending on
>how
>their code is written your change could now become a breaking change:
>for
>example they just lop off the last character of the template's return
>without checking to see if it is indeed a newline character.

I think you have the change the wrong way round (unless I do). The current behaviour is:

- PHP blocks at end of file -> suppress following newline
- PHP blocks elsewhere in file -> suppress following newline

The proposed behaviour is:

- PHP blocks at end of file -> suppress following newline (no change)
- PHP blocks elsewhere in file -> treat following newline literally

So in your scenario, there would be no newline to trim, before or after the proposed change.

Regards,

--
Rowan Collins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Andreas Treichel
Re: [PHP-DEV] Consider only ignoring newlines for final ?> in a file
September 07, 2017 07:10PM
> I always assumed that this is the reason why we do this in the first place.

I think the main reason was that old versions of ie go into quirksmode
if the doctype is not in the first line of the output e.g.:

<?php header('Content-Type: text/html'); ?>
<!DOCTYPE html>

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Sorry, only registered users may post in this forum.

Click here to login