Welcome! Log In Create A New Profile

Advanced

[PHP-DEV] concatenation operator

Posted by Adi Mutu 
Adi Mutu
[PHP-DEV] concatenation operator
June 05, 2012 10:10PM
Hello,

Can somebody point me to where the concatenation operator is implemented ?  "." operator.

Thanks,
Felipe Pena
Re: [PHP-DEV] concatenation operator
June 05, 2012 10:20PM
Hi,

2012/6/5 Adi Mutu <[email protected]>:
>
>
> Hello,
>
> Can somebody point me to where the concatenation operator is implemented ?  "." operator.
>
> Thanks,

See http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_vm_def.h#133

--
Regards,
Felipe Pena

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Adi Mutu
Re: [PHP-DEV] concatenation operator
June 07, 2012 09:00PM
that's nice, but i haven't understood a thing...i know something about php core and php extensions, but nothing about the Zend engine specific. 
Can you point me to some resources on this topic?

Thanks,


________________________________
From: Felipe Pena <[email protected]>
To: Adi Mutu <[email protected]>
Cc: PHP Developers Mailing List <[email protected]>
Sent: Tuesday, June 5, 2012 11:17 PM
Subject: Re: [PHP-DEV] concatenation operator

Hi,

2012/6/5 Adi Mutu <[email protected]>:
>
>
> Hello,
>
> Can somebody point me to where the concatenation operator is implemented ?  "." operator.
>
> Thanks,

See http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_vm_def.h#133

--
Regards,
Felipe Pena

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Johannes Schlüter
Re: [PHP-DEV] concatenation operator
June 07, 2012 09:50PM
On Thu, 2012-06-07 at 11:50 -0700, Adi Mutu wrote:
>
> that's nice, but i haven't understood a thing...i know something about
> php core and php extensions, but nothing about the Zend engine
> specific.

The mentioned place is directly in the VM, which in general is harder to
understand, but well, it directs to the "concat_function" on
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_operators.c#1234

Knowing basic C should be enough to understand the code there. The
actual "algorithm" can also easily be guessed (allocate a buffer which
can hold both strings at once and copy them over,the code is a tiny bit
more complex as it tries tore use an existing buffer than allocating
something completely new)

The question is: What do you actually want to know?

> Can you point me to some resources on this topic?

Unfortunately not. The source is the best documentation we have for
that.

johannes



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Adi Mutu
Re: [PHP-DEV] concatenation operator
June 07, 2012 10:00PM
Ok Johannes, thanks for the answer. I'll try to look deeper. 
I basically just wanted to know what happens when you concatenate two strings? what emalloc/efree happens.

Also can you tell me if possible how to put a breakpoint to emalloc/efree which are executed only after all core functions are registered? because it takes like a million years like this and a million F8 presses...

Thanks.


________________________________
From: Johannes Schlüter <[email protected]>
To: Adi Mutu <[email protected]>
Cc: Felipe Pena <[email protected]>; PHP Developers Mailing List <[email protected]>
Sent: Thursday, June 7, 2012 10:44 PM
Subject: Re: [PHP-DEV] concatenation operator

On Thu, 2012-06-07 at 11:50 -0700, Adi Mutu wrote:
>
> that's nice, but i haven't understood a thing...i know something about
> php core and php extensions, but nothing about the Zend engine
> specific.

The mentioned place is directly in the VM, which in general is harder to
understand, but well, it directs to the "concat_function" on
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_operators.c#1234

Knowing basic C should be enough to understand the code there. The
actual "algorithm" can also easily be guessed (allocate a buffer which
can hold both strings at once and copy them over,the code is a tiny bit
more complex as it tries tore use an existing buffer than allocating
something completely new)

The question is: What do you actually want to know?

> Can you point me to some resources on this topic?

Unfortunately not. The source is the best documentation we have for
that.

johannes



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Johannes Schlüter
Re: [PHP-DEV] concatenation operator
June 07, 2012 10:20PM
On Thu, 2012-06-07 at 12:53 -0700, Adi Mutu wrote:
> Ok Johannes, thanks for the answer. I'll try to look deeper.
> I basically just wanted to know what happens when you concatenate two
> strings? what emalloc/efree happens.

This depends. As always. As said what has to be done is one allocation
for the result value ... and then the zval magic, which depends on
refcount, references, ...

> Also can you tell me if possible how to put a breakpoint to
> emalloc/efree which are executed only after all core functions are
> registered? because it takes like a million years like this and a
> million F8 presses...

Depends on your debugger. Most allow conditional breakpoints or have a
breakpoint and while holding at some place add a few more ...

For such a question my preference is using DTrace (on Solaris, Mac or
BSD), something like this session:

$ cat test.d
#!/sbin/dtrace

pid$target::concat_function:entry {
self->in_concat = 1;
}

pid$target::execute:return {
self->in_concat = 0;
}

pid$target::_emalloc:entry
/ self->in_concat /
{
trace(arg0);
ustack();
}

pid$target::_erealloc:entry
/ self->in_concat /
{
trace(arg0);
trace(arg1);
ustack();
}

$ cat test1.php
<?php
$a = "foo"; $b = "bar"; $a.$b;

$ dtrace -s test.d -c 'php test1.php'
dtrace: script 'test.d' matched 4 probes
dtrace: pid 16406 has exited
CPU ID FUNCTION:NAME
3 100372 _emalloc:entry 7
php`_emalloc
php`concat_function+0x270
php`ZEND_CONCAT_SPEC_CV_CV_HANDLER+0xcd
php`execute+0x3d9
php`dtrace_execute+0xe7
php`zend_execute_scripts+0xf5
php`php_execute_script+0x2e8
php`do_cli+0x864
php`main+0x6e2
php`_start+0x83

$ cat test2.php
<?php
$a = 23; $b = "bar"; $a.$b;

$ dtrace -s test.d -c 'php test2.php'
dtrace: script 'test.d' matched 4 probes
dtrace: pid 16425 has exited
CPU ID FUNCTION:NAME
1 100373 _erealloc:entry 0 79
php`_erealloc
php`xbuf_format_converter+0x11ee
php`vspprintf+0x34
php`zend_spprintf+0x2f
php`_convert_to_string+0x174
php`zend_make_printable_zval+0x5ec
php`concat_function+0x3c
php`ZEND_CONCAT_SPEC_CV_CV_HANDLER+0xcd
php`execute+0x3d9
php`dtrace_execute+0xe7
php`zend_execute_scripts+0xf5
php`php_execute_script+0x2e8
php`do_cli+0x864
php`main+0x6e2
php`_start+0x83

1 100372 _emalloc:entry 6
php`_emalloc
php`concat_function+0x270
php`ZEND_CONCAT_SPEC_CV_CV_HANDLER+0xcd
php`execute+0x3d9
php`dtrace_execute+0xe7
php`zend_execute_scripts+0xf5
php`php_execute_script+0x2e8
php`do_cli+0x864
php`main+0x6e2
php`_start+0x83

So, when having two constant strings there's a single malloc, in this
case allocating 7 bytes (strlen("foo")+strlen("bar")+1), if you have a
different type it has to be converted first ...


johannes



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Morgan L. Owens
Re: Re: [PHP-DEV] concatenation operator
June 13, 2012 05:30AM
On 2012-06-08 08:18, Johannes Schlüter wrote:
> On Thu, 2012-06-07 at 12:53 -0700, Adi Mutu wrote:
>> Ok Johannes, thanks for the answer. I'll try to look deeper.
>> I basically just wanted to know what happens when you concatenate two
>> strings? what emalloc/efree happens.
>
> This depends. As always. As said what has to be done is one allocation
> for the result value ... and then the zval magic, which depends on
> refcount, references, ...
>
....
>
> So, when having two constant strings there's a single malloc, in this
> case allocating 7 bytes (strlen("foo")+strlen("bar")+1), if you have a
> different type it has to be converted first ...
>

After reading the performance improvements RFC about interned strings,
and its passing mention of a "special data structure (e.g. zend_string)
instead of char*", I've been thinking a little bit about this and what
such a structure could be.

But rather than interned strings, I thought that _implicit_
concatenation would be a bigger win in the long term. Like interning, it
relies on strings being immutable.

This zend_string is a composite type. Leaves are _almost_ identical to
existing string zvals - char* val, int len - but also an additional
"child_count" field. For leaves, child_count is zero (not incidentally
indicating that it _is_ a leaf). For internal nodes, "val" is a list of
zend_strings (child_count of them). "len" still refers to the total
string length (the sum of the len fields of its children).

So a string that has been built up through concatenation is represented
by a tree (actually a dag) of zend_strings. The edges in this dag are
all properly reference-counted; discarding a string decrements the
reference counts of its children.

Only when the character data is needed for something does it need to be
allocated for and copied into one place (the internal node can then
become a leaf).


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Ángel González
Re: [PHP-DEV] concatenation operator
June 14, 2012 06:10PM
On 13/06/12 05:26, Morgan L. Owens wrote:
> After reading the performance improvements RFC about interned strings,
> and its passing mention of a "special data structure (e.g.
> zend_string) instead of char*", I've been thinking a little bit about
> this and what such a structure could be.
>
> But rather than interned strings, I thought that _implicit_
> concatenation would be a bigger win in the long term. Like interning,
> it relies on strings being immutable.
>
> This zend_string is a composite type. Leaves are _almost_ identical to
> existing string zvals - char* val, int len - but also an additional
> "child_count" field. For leaves, child_count is zero (not incidentally
> indicating that it _is_ a leaf). For internal nodes, "val" is a list
> of zend_strings (child_count of them). "len" still refers to the total
> string length (the sum of the len fields of its children).
>
> So a string that has been built up through concatenation is
> represented by a tree (actually a dag) of zend_strings. The edges in
> this dag are all properly reference-counted; discarding a string
> decrements the reference counts of its children.
How do you list then? As a single-linked list?
That would avoid reuse of the component strings in different
superstrings except from matching ends...




--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Morgan L. Owens
Re: Re: [PHP-DEV] concatenation operator
June 15, 2012 03:30AM
On 2012-06-15 04:00, Ángel González wrote:
> On 13/06/12 05:26, Morgan L. Owens wrote:
>> After reading the performance improvements RFC about interned strings,
>> and its passing mention of a "special data structure (e.g.
>> zend_string) instead of char*", I've been thinking a little bit about
>> this and what such a structure could be.
>>
>> But rather than interned strings, I thought that _implicit_
>> concatenation would be a bigger win in the long term. Like interning,
>> it relies on strings being immutable.
>>
>> This zend_string is a composite type. Leaves are _almost_ identical to
>> existing string zvals - char* val, int len - but also an additional
>> "child_count" field. For leaves, child_count is zero (not incidentally
>> indicating that it _is_ a leaf). For internal nodes, "val" is a list
>> of zend_strings (child_count of them). "len" still refers to the total
>> string length (the sum of the len fields of its children).
>>
>> So a string that has been built up through concatenation is
>> represented by a tree (actually a dag) of zend_strings. The edges in
>> this dag are all properly reference-counted; discarding a string
>> decrements the reference counts of its children.
> How do you list then? As a single-linked list?
> That would avoid reuse of the component strings in different
> superstrings except from matching ends...
>
I was thinking just in terms of an array (the composite would be
pointing either to an array of characters or an array of strings).
Mainly just because that's how I pictured it (and haven't thought of a
reason not to, since the number of children is known when the
concatenated string is created, and fixed due to immutability).

Component strings aren't copied as such, only referenced. In that sense
the choice of array vs. list comes down to where that reference is kept
- in the parent string or the elder sibling. Sharing common suffixes
would save a number of references, but when concatenating two existing
strings, the list of component references in the _prefix_ would need to
be copied for the sake of whatever else is using it at the time
(otherwise they would end up with the concatenated string as well).

Speaking of concatenation, unless potentially scary stuff is done,
concatenating three strings is done by concatenating two of them, then
concatenating the result with the third, giving a binary tree; so why am
I suggesting an array of arbitrary length? Think of an implementation of
PHP's join()/implode() that exploits this structure.


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Adi Mutu
Re: [PHP-DEV] concatenation operator
June 29, 2012 08:50PM
Hello,

Sorry for the late reply, I was away for a while......
I don't think I have dtrace because I'm on fedora.....but i'll research.

If i would want to set a breakpoint after php's initialization process, but right before the scripts execution, so that after that I can set breakpoints to emalloc and efree which are executed only during my scripts execution where should i set it? Hope the question was clear enough.....

dtrace related:
Why have you used 'execute:return' and not concat_function:return? What's with the execute function?


Thanks,
A.


________________________________
From: Johannes Schlüter <[email protected]>
To: Adi Mutu <[email protected]>
Cc: Felipe Pena <[email protected]>; PHP Developers Mailing List <[email protected]>
Sent: Thursday, June 7, 2012 11:18 PM
Subject: Re: [PHP-DEV] concatenation operator

On Thu, 2012-06-07 at 12:53 -0700, Adi Mutu wrote:
> Ok Johannes, thanks for the answer. I'll try to look deeper.
> I basically just wanted to know what happens when you concatenate two
> strings? what emalloc/efree happens.

This depends. As always. As said what has to be done is one allocation
for the result value ... and then the zval magic, which depends on
refcount, references, ...

> Also can you tell me if possible how to put a breakpoint to
> emalloc/efree which are executed only after all core functions are
> registered? because it takes like a million years like this and a
> million F8 presses...

Depends on your debugger.. Most allow conditional breakpoints or have a
breakpoint and while holding at some place add a few more ...

For such a question my preference is using DTrace (on Solaris, Mac or
BSD), something like this session:

        $ cat test.d
        #!/sbin/dtrace
       
        pid$target::concat_function:entry {
            self->in_concat = 1;
        }
       
        pid$target::execute:return {
            self->in_concat = 0;
        }
       
        pid$target::_emalloc:entry
        / self->in_concat /
        {
            trace(arg0);
            ustack();
        }
       
        pid$target::_erealloc:entry
        / self->in_concat /
        {
            trace(arg0);
            trace(arg1);
            ustack();
        }
       
        $ cat test1.php
        <?php
        $a = "foo"; $b = "bar"; $a.$b;
       
        $ dtrace -s test.d -c 'php test1.php'
        dtrace: script 'test.d' matched 4 probes
        dtrace: pid 16406 has exited
        CPU    ID                    FUNCTION:NAME
          3 100372                  _emalloc:entry                7
                      php`_emalloc
                      php`concat_function+0x270
                      php`ZEND_CONCAT_SPEC_CV_CV_HANDLER+0xcd
                      php`execute+0x3d9
                      php`dtrace_execute+0xe7
                      php`zend_execute_scripts+0xf5
                      php`php_execute_script+0x2e8
                      php`do_cli+0x864
                      php`main+0x6e2
                      php`_start+0x83
       
        $ cat test2.php
        <?php
        $a = 23; $b = "bar"; $a.$b;
       
        $ dtrace -s test.d -c 'php test2.php'
        dtrace: script 'test.d' matched 4 probes
        dtrace: pid 16425 has exited
        CPU    ID                    FUNCTION:NAME
          1 100373                  _erealloc:entry                0              79
                      php`_erealloc
                      php`xbuf_format_converter+0x11ee
                      php`vspprintf+0x34
                      php`zend_spprintf+0x2f
                      php`_convert_to_string+0x174
                      php`zend_make_printable_zval+0x5ec
                      php`concat_function+0x3c
                      php`ZEND_CONCAT_SPEC_CV_CV_HANDLER+0xcd
                      php`execute+0x3d9
                      php`dtrace_execute+0xe7
                      php`zend_execute_scripts+0xf5
                      php`php_execute_script+0x2e8
                      php`do_cli+0x864
                      php`main+0x6e2
                      php`_start+0x83
       
          1 100372                  _emalloc:entry                6
                      php`_emalloc
                      php`concat_function+0x270
                      php`ZEND_CONCAT_SPEC_CV_CV_HANDLER+0xcd
                      php`execute+0x3d9
                      php`dtrace_execute+0xe7
                      php`zend_execute_scripts+0xf5
                      php`php_execute_script+0x2e8
                      php`do_cli+0x864
                      php`main+0x6e2
                      php`_start+0x83
       
So, when having two constant strings there's a single malloc, in this
case allocating 7 bytes (strlen("foo")+strlen("bar")+1), if you have a
different type it has to be converted first ...


johannes
Johannes Schlüter
Re: [PHP-DEV] concatenation operator
June 29, 2012 11:40PM
On Fri, 2012-06-29 at 11:47 -0700, Adi Mutu wrote:
> Sorry for the late reply, I was away for a while......
> I don't think I have dtrace because I'm on fedora.....but i'll
> research.

As said: Currently only on Solaris, MacOS and BSD. Oracle is porting
DTrace to Oracle Linux. RedHat created SystemTap which is similar, ut I
have never used it.

> If i would want to set a breakpoint after php's initialization
> process, but right before the scripts execution, so that after that I
> can set breakpoints to emalloc and efree which are executed only
> during my scripts execution where should i set it? Hope the question
> was clear enough.....

Depends on your view what "initialization" is. But execute() might be a
place which helps ... but even then you will see many things you're
probably not interested in. Only thing that helps is learning the code
structure and digging through it.

> dtrace related:
> Why have you used 'execute:return' and not concat_function:return?
> What's with the execute function?

That was a bug since I quickly edited an older script. In this case it
doesn't change the result.

johannes



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Adi Mutu
Re: [PHP-DEV] concatenation operator
June 30, 2012 01:00PM
By initialization i mean the latest point possible where I can set a breakpoint, but right before my scripts starts executing it's emalloc's or efree's.....

> but even then you will see many things you're probably not interested in

such as?......

> Only thing that helps is learning the code structure and digging through it.

Any hint/documentation to learn that?

Thanks.



________________________________
From: Johannes Schlüter <[email protected]>
To: Adi Mutu <[email protected]>
Cc: Felipe Pena <[email protected]>; PHP Developers Mailing List <[email protected]>
Sent: Saturday, June 30, 2012 12:36 AM
Subject: Re: [PHP-DEV] concatenation operator

On Fri, 2012-06-29 at 11:47 -0700, Adi Mutu wrote:
> Sorry for the late reply, I was away for a while......
> I don't think I have dtrace because I'm on fedora.....but i'll
> research.

As said: Currently only on Solaris, MacOS and BSD. Oracle is porting
DTrace to Oracle Linux. RedHat created SystemTap which is similar, ut I
have never used it.

> If i would want to set a breakpoint after php's initialization
> process, but right before the scripts execution, so that after that I
> can set breakpoints to emalloc and efree which are executed only
> during my scripts execution where should i set it? Hope the question
> was clear enough.....

Depends on your view what "initialization" is. But execute() might be a
place which helps ... but even then you will see many things you're
probably not interested in. Only thing that helps is learning the code
structure and digging through it.

> dtrace related:
> Why have you used 'execute:return' and not concat_function:return?
> What's with the execute function?

That was a bug since I quickly edited an older script. In this case it
doesn't change the result.

johannes
Johannes Schlüter
Re: [PHP-DEV] concatenation operator
July 01, 2012 02:00AM
On Sat, 2012-06-30 at 03:53 -0700, Adi Mutu wrote:
>
> By initialization i mean the latest point possible where I can set a
> breakpoint, but right before my scripts starts executing it's
> emalloc's or efree's.....

Does "executing" include "compilation"? Does it include creating a stack
frame etc. for the "main" routine? ...

> > but even then you will see many things you're probably not
> interested in
>
> such as?......

Well, PHP is complex, it does quite a few things in order to run a
seemingly small script.

> > Only thing that helps is learning the code structure and digging
> through it.
>
> Any hint/documentation to learn that?

Use the source. ;-)

A bit more seriously: No, there's no good single place to look at, there
are different blogs etc looking at specific pieces in detail, but the
best thing to do is looking at the code (the filenames in Zend/ give a
good idea what they are for ...), take a question and time and start
digging. For some things it's also good to look into xdebug, vld,
runkit, ... and see where they hook in to do their magic. And well, the
path from main() in sapi/cli/php_cli.c to execute() is not that long,
what then happens is a bit more complicated though (while then again,
once you're in, quite easy for most parts, too)

johannes



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Christopher Jones
Re: [PHP-DEV] concatenation operator
July 02, 2012 08:10PM
On 06/30/2012 04:51 PM, Johannes Schlüter wrote:
> On Sat, 2012-06-30 at 03:53 -0700, Adi Mutu wrote:

>>> Only thing that helps is learning the code structure and digging
>> through it.
>>
>> Any hint/documentation to learn that?
>
> Use the source. ;-)
>
> A bit more seriously: No, there's no good single place to look at, there
> are different blogs etc looking at specific pieces in detail, but the
> best thing to do is looking at the code (the filenames in Zend/ give a
> good idea what they are for ...), take a question and time and start
> digging. For some things it's also good to look into xdebug, vld,
> runkit, ... and see where they hook in to do their magic. And well, the
> path from main() in sapi/cli/php_cli.c to execute() is not that long,
> what then happens is a bit more complicated though (while then again,
> once you're in, quite easy for most parts, too)
>
> johannes

There is a wiki page linking to some useful resources: https://wiki.php.net/internals/references

Chris

--
christopher.jones@oracle.com
http://twitter.com/#!/ghrd



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Sorry, only registered users may post in this forum.

Click here to login