Welcome! Log In Create A New Profile

Advanced

[PHP-DEV] Providing built-in functionality written in PHP (was RE: [PHP-DEV] [VOTE] UUID)

Posted by Zeev Suraski 
I think that actually makes a lot of sense, and not just because of the supportability – but also because of security. A whole class of security exploits – buffer/stack overflows, underruns and all sorts of memory mismanagement become irrelevant when the code is implemented in PHP. I brought this direction up in a discussion on the Security mailing list a few weeks ago without any traction – but it probably makes more sense to discuss it here anyway.

I think that currently, there are two main challenges:

1. Performance – compute intensive logic is way slower in PHP compared to C.
2. Delivery method – we don’t currently have a good way of providing functions that are written in PHP and have them provide the same ‘native’ / ‘builtin’ experience as functions/classes written in C.

Regarding #1, often this isn’t very important as not all pieces of code are that compute intensive. Moreover, if/when JIT materializes, compute intensive logic in PHP will become a lot faster than it is today and probably in the same ballpark as C – so it’ll open the door for us implementing more and more things in PHP.

Regarding #2 – I think that’s something that can be solved relatively easily, but admittedly I haven’t completely thought it through (read: I barely thought about it).

We could create a mechanism where the contents of certain .php files is embedded into the binary, compiled during MINIT, and made available pretty at the same ‘builtinness’ level as C extensions. We’d probably have to be pretty selective in terms of what goes in there – probably just as selective as we are with the C-based extensions, but I’d imagine that things like ext/exif, UUID, and perhaps even things like unserialize() could find themselves written in pure PHP using such a mechanism.

Thoughts?

Zeev


From: Arvids Godjuks [mailto:[email protected]]
Sent: Wednesday, September 6, 2017 2:43 PM
To: Dan Ackroyd <[email protected]>; internals@lists.php.net
Cc: Zeev Suraski <[email protected]>
Subject: Re: [PHP-DEV] [VOTE] UUID


I'd seriously start considering to start doing PHP code for things like these, so they are not bogged down by the fact that they are in C and there is 0.5 devs interested in supporting it.

On Wed, 6 Sep 2017, 14:09 Dan Ackroyd <[email protected]<mailto:[email protected]>> wrote:
On 5 September 2017 at 18:24, Fleshgrinder <[email protected]<mailto:[email protected]>> wrote:
> Maybe I should stop the vote. The discussion is happening now instead of
> before when I asked for it. We'll have to wait for at least six months
> for another vote if this is a no, due to the rules.

That would be fine and appropriate. The RFC targets 7.3. Having a
discussion and vote in March gives plenty of time for getting it into
7.3

Cancelling a vote just to avoid an RFC being rejected is (imo) playing
slightly fast and loose with the rules.

cheers
Dan
Ack

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Le 06/09/2017 à 14:46, Zeev Suraski a écrit :

> We could create a mechanism where the contents of certain .php files is embedded into the binary, compiled during MINIT, and made available pretty at the same ‘builtinness’ level as C extensions.

Just for memory, an implementation already exists
https://pecl.php.net/package/pcs
On Wed, Sep 6, 2017 at 2:46 PM, Zeev Suraski <[email protected]> wrote:

> I think that actually makes a lot of sense, and not just because of the
> supportability – but also because of security. A whole class of security
> exploits – buffer/stack overflows, underruns and all sorts of memory
> mismanagement become irrelevant when the code is implemented in PHP. I
> brought this direction up in a discussion on the Security mailing list a
> few weeks ago without any traction – but it probably makes more sense to
> discuss it here anyway.
>
> I think that currently, there are two main challenges:
>
> 1. Performance – compute intensive logic is way slower in PHP compared
> to C.
> 2. Delivery method – we don’t currently have a good way of providing
> functions that are written in PHP and have them provide the same ‘native’ /
> ‘builtin’ experience as functions/classes written in C.
>
> Regarding #1, often this isn’t very important as not all pieces of code
> are that compute intensive. Moreover, if/when JIT materializes, compute
> intensive logic in PHP will become a lot faster than it is today and
> probably in the same ballpark as C – so it’ll open the door for us
> implementing more and more things in PHP.
>
> Regarding #2 – I think that’s something that can be solved relatively
> easily, but admittedly I haven’t completely thought it through (read: I
> barely thought about it).
>
> We could create a mechanism where the contents of certain .php files is
> embedded into the binary, compiled during MINIT, and made available pretty
> at the same ‘builtinness’ level as C extensions. We’d probably have to be
> pretty selective in terms of what goes in there – probably just as
> selective as we are with the C-based extensions, but I’d imagine that
> things like ext/exif, UUID, and perhaps even things like unserialize()
> could find themselves written in pure PHP using such a mechanism.
>
> Thoughts?
>
> Zeev
>

There has been a discussion about this recently:
https://externals.io/message/99366

Nikita
On Wed, Sep 6, 2017 at 4:19 PM, Nikita Popov <[email protected]> wrote:

> On Wed, Sep 6, 2017 at 2:46 PM, Zeev Suraski <[email protected]> wrote:
>
> > I think that actually makes a lot of sense, and not just because of the
> > supportability – but also because of security. A whole class of security
> > exploits – buffer/stack overflows, underruns and all sorts of memory
> > mismanagement become irrelevant when the code is implemented in PHP. I
> > brought this direction up in a discussion on the Security mailing list a
> > few weeks ago without any traction – but it probably makes more sense to
> > discuss it here anyway.
> >
> > I think that currently, there are two main challenges:
> >
> > 1. Performance – compute intensive logic is way slower in PHP compared
> > to C.
> > 2. Delivery method – we don’t currently have a good way of providing
> > functions that are written in PHP and have them provide the same
> ‘native’ /
> > ‘builtin’ experience as functions/classes written in C.
> >
> > Regarding #1, often this isn’t very important as not all pieces of code
> > are that compute intensive. Moreover, if/when JIT materializes, compute
> > intensive logic in PHP will become a lot faster than it is today and
> > probably in the same ballpark as C – so it’ll open the door for us
> > implementing more and more things in PHP.
> >
> > Regarding #2 – I think that’s something that can be solved relatively
> > easily, but admittedly I haven’t completely thought it through (read: I
> > barely thought about it).
> >
> > We could create a mechanism where the contents of certain .php files is
> > embedded into the binary, compiled during MINIT, and made available
> pretty
> > at the same ‘builtinness’ level as C extensions. We’d probably have to
> be
> > pretty selective in terms of what goes in there – probably just as
> > selective as we are with the C-based extensions, but I’d imagine that
> > things like ext/exif, UUID, and perhaps even things like unserialize()
> > could find themselves written in pure PHP using such a mechanism.
> >
> > Thoughts?
> >
> > Zeev
> >
>
> There has been a discussion about this recently:
> https://externals.io/message/99366


Thanks for the pointer! I didn't pay close attention to that discussion
back then. I do remember François brought it up in a discussion back in
2015 in Paris.

For me the issue of security is a major benefit that I don't think was
brought up in that discussion. It's the killer feature as far as I'm
concerned.

For those who mentioned that PHP code is better managed in Composer - I
think we're talking about different layers here. The goal wouldn't be
pulling in things from the framework layers into the PHP layer, but being
able to surgically replace existing implementations that could benefit from
being written in PHP - as well as potentially some very basic non-complex
building blocks that change very infrequently if at all. I don't really
know, but I'm guessing the challenge experienced by MongoDB probably had to
do with the fact this was a pretty big & complex extension as well as one
that evolves and changes pretty frequently - the kind of which is probably
still better served in C code (or alternatively, a Composer based package).

PCS seems to be a huge step in the direction I think we need to take,
although I think we probably need to push it further a couple of notches.

First, offhand, I don't think autoloading should play a role. We should
create a mechanism where the PHP-based code is at the exact same level as
C-based code, i.e., it's available the whole time, including
function_exists(), indirect reference and whatnot. In order to achieve
that with good performance we probably need to find a way to compile all
the different built-in PHP elements in one go so that they fit into one
'virtual include' that can be easily and quickly made available to all
requests. I don't think PCS currently supports that, but I'm pretty sure
we can achieve that.

Secondly, ideally, this shouldn't just be a mechanism to mix and match C
and PHP - but actually make it easy for people to write pure PHP code
that'll become integrated to the PHP binary. PCS can practically already
support that, we'd just need some build magic to take .php files during
build, create the PCS wrapper for them and compile them right in. Have
some sort of a standard where builtin.php files inside extension
directories are included, or something of the sort? Definitely requires
more thinking, but if this becomes a basic building block of PHP as I think
it should, I wouldn't want end users to have to manually compile PHP code
into .phpc or create .c wrappers - but have it as automatic as possible.

Zeev
On 06/09/17 13:46, Zeev Suraski wrote:
> We’d probably have to be pretty selective in terms of what goes in there – probably just as selective as we are with the C-based extensions, but I’d imagine that things like ext/exif, UUID, and perhaps even things like unserialize() could find themselves written in pure PHP using such a mechanism.

My own UUID is a old time UDF add on to the database as the new built in
function there does not allow for selection of a Type 1 UUID. UDF is the
ideal tool to add functions at the database layer, but it's a pain
because it does require C code ... currently.

Validation is another area where one often needs to be able to bolt on
your own extra functions. Being able to write one's own extensions to
things like variable creation or validation is where we are today and
writing that functionality optionally in PHP makes sense.

The problem is with there being no obvious base to build on. That a
variable is often more complex than simply 'int' is a fact, and creating
an object for each variable with all of this extra functionality is
little different to adding a UUID variable. So a standard method of
being allowed to create additional UUID like variables and validate that
the supplied data to populate hem is correct.

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Hi Zeev,


Le 06/09/2017 à 16:01, Zeev Suraski a écrit :
> Thanks for the pointer! I didn't pay close attention to that discussion
> back then. I do remember François brought it up in a discussion back in
> 2015 in Paris.
>
> For me the issue of security is a major benefit that I don't think was
> brought up in that discussion. It's the killer feature as far as I'm
> concerned.

Sure. During the discussion, we focused on maintainability but increased
security is a key benefit too.

> For those who mentioned that PHP code is better managed in Composer - I
> think we're talking about different layers here. The goal wouldn't be
> pulling in things from the framework layers into the PHP layer, but being
> able to surgically replace existing implementations that could benefit from
> being written in PHP - as well as potentially some very basic non-complex
> building blocks that change very infrequently if at all. I don't really
> know, but I'm guessing the challenge experienced by MongoDB probably had to
> do with the fact this was a pretty big & complex extension as well as one
> that evolves and changes pretty frequently - the kind of which is probably
> still better served in C code (or alternatively, a Composer based package).

The discussion made clear that the feature has nothing to do with
composer, and I hope everyone now agrees on this.

> PCS seems to be a huge step in the direction I think we need to take,
> although I think we probably need to push it further a couple of notches.

My primary constraint, with PCS, was to implement it as a pure
extension, without modifying anything in the core. But the discussion
made clear that, when the feature exists, making it optional is useless.
That's why I am currently working on a new version, as an extension too,
but a mandatory one, included in the PHP core.

> First, offhand, I don't think autoloading should play a role. We should
> create a mechanism where the PHP-based code is at the exact same level as
> C-based code, i.e., it's available the whole time, including
> function_exists(), indirect reference and whatnot. In order to achieve
> that with good performance we probably need to find a way to compile all
> the different built-in PHP elements in one go so that they fit into one
> 'virtual include' that can be easily and quickly made available to all
> requests. I don't think PCS currently supports that, but I'm pretty sure
> we can achieve that.

I also gave up with the autoloader. If functions could be autoloaded, it
could be worth going further, but this is definitely not the right
mechanism to base it on.

So, help is welcome here. Currently, my plan is to use opcache to
persist the code and load it when needed. I already have the stream
wrapper, virtual paths... Combining this with 'fake' function entries
should allow to load it just in time. Well, this requires more
investigation... Another approach would be to compile the code during
MINIT and make it persistent outside of opcache. If you have more ideas
about it, they are appreciated.

> Secondly, ideally, this shouldn't just be a mechanism to mix and match C
> and PHP - but actually make it easy for people to write pure PHP code
> that'll become integrated to the PHP binary. PCS can practically already
> support that, we'd just need some build magic to take .php files during
> build, create the PCS wrapper for them and compile them right in. Have
> some sort of a standard where builtin.php files inside extension
> directories are included, or something of the sort? Definitely requires
> more thinking, but if this becomes a basic building block of PHP as I think
> it should, I wouldn't want end users to have to manually compile PHP code
> into .phpc or create .c wrappers - but have it as automatic as possible.

Parsing the code, extracting symbol names, and generating all needed
information as a C include file is done offline because it requires a
working PHP interpreter. So, it cannot be done at build time (when
compiling the core). It is done the same way as the core files generated
using PHP (like zend_vm_execute.h/zend_vm_opcodes.h): each time, he
modifies the PHP code, the developer must regenerate the '.phpc' files.
The '.phpc' file will be managed in git and will be included in the
source distribution. So, nothing changes for the end user :
phpize/configure/make. The same for extensions included in the core:
'.phpc' files are included in the source distrib. I'll also probably
include this code (re)generation in 'phpize'.

So, the most difficult part, AFAIK, is how to make class and function
code persistent, or load it just in time, or a combination of both. The
case of constants is more complex and out of scope at this time. I hope
I have some times during the next weeks to implement a prototype, at
least for functions.

Regards

François

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
On Wed, Sep 6, 2017 at 11:30 AM, François Laupretre
<[email protected]> wrote:
>> Secondly, ideally, this shouldn't just be a mechanism to mix and match C
>> and PHP - but actually make it easy for people to write pure PHP code
>> that'll become integrated to the PHP binary. PCS can practically already
>> support that, we'd just need some build magic to take .php files during
>> build, create the PCS wrapper for them and compile them right in. Have
>> some sort of a standard where builtin.php files inside extension
>> directories are included, or something of the sort? Definitely requires
>> more thinking, but if this becomes a basic building block of PHP as I
>> think
>> it should, I wouldn't want end users to have to manually compile PHP code
>> into .phpc or create .c wrappers - but have it as automatic as possible.
>
>
> Parsing the code, extracting symbol names, and generating all needed
> information as a C include file is done offline because it requires a
> working PHP interpreter. So, it cannot be done at build time (when compiling
> the core). It is done the same way as the core files generated using PHP
> (like zend_vm_execute.h/zend_vm_opcodes.h): each time, he modifies the PHP
> code, the developer must regenerate the '.phpc' files. The '.phpc' file will
> be managed in git and will be included in the source distribution. So,
> nothing changes for the end user : phpize/configure/make. The same for
> extensions included in the core: '.phpc' files are included in the source
> distrib. I'll also probably include this code (re)generation in 'phpize'.
>
> So, the most difficult part, AFAIK, is how to make class and function code
> persistent, or load it just in time, or a combination of both. The case of
> constants is more complex and out of scope at this time. I hope I have some
> times during the next weeks to implement a prototype, at least for
> functions.
>
I believe I brought this up during the previous round of PCS
discussion, but HHVM does this with its SystemLib already.

In short (ignoring the per-extension systemlibs for the moment), all
the files named in hphp/system/php.txt* are bundled into a single file
at compile time and that file is dropped into a .text section in the
binary. Then on startup, that section is inspected and fed to the
compiler. There's implementations in HHVM for doing this on Linux,
Mac, and Windows. I can't say for sure that all of PHP's supported
OSs have a solution available, but the big three certainly do.

-Sara

* The files to be included are explicitly named in order to guarantee
load order and optimal hoistability. This enumeration may or may not
be necessary based on how PHP's binds symbols.

** HHVM places some restrictions on what goes into a systemlib file.
1. No side-effects/psuedo-main. This is probably already a
requirement of PCS, just pointing it out.
2. Files either need namespace blocks (with curly braces), or an
implicit empty namespace will be applied to them.
3. No file-level namespaces or declare statements.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Sorry, only registered users may post in this forum.

Click here to login