Welcome! Log In Create A New Profile

Advanced

[PHP-DEV] New functions: string_starts_with(), string_ends_with()

Posted by Andreas Hennings 
Hello list,
a quite common use case is that one needs to find out if a string
$haystack begins or ends with another string $needle.
Or in other words, if $needle is a prefix or a suffix of $haystack.

One prominent example would be in PSR-4 or PSR-0 class loaders.
Maybe the use case also occurs when writing parsers..
In each of these two examples (parsers, class loaders), we care about
performance.

(forgive me if this was discussed before, I did not find it anywhere
in the archives)

--------------------------

Existing solutions to this problem feel non-trivial, and/or are
suboptimal in performance.
https://stackoverflow.com/questions/2790899/how-to-check-if-a-string-starts-with-a-specified-string
https://stackoverflow.com/questions/834303/startswith-and-endswith-functions-in-php
This answer compares different solutions,
https://stackoverflow.com/a/7168986/246724

Existing solutions:
(Let's focus on string_starts_with(), the other case is mostly
equivalent / symmetric)

if (0 === strpos($haystack, $needle)) {..}
I have often seen this presented as the preferable solution.
Unfortunately, this searches the entire string, not just the
beginning. Especially if $haystack is really long, this can be a
waste.
E.g. if (0 === strpos(file_get_contents('some_source_file.php'),
'<?php')) {..} will search the entire file for an occurence of
'<?php'.

if ($needle === substr($haystack, 0, strlen($needle))) {..}
This reserves new memory for the substring, which later needs to be
garbage-collected.
Also, this requires an additional function call to strlen() - which
adds even more clutter if $needle is an expression, not just a
variable.

if (0 === strncmp($haystack, $needle, strlen($needle))) {..}
Needs the additional call to strlen().
Otherwise, this seems like a really good solution.

if ('' === $needle || false !== strrpos($haystack, $needle,
-strlen($haystack))) {..}
This is the funky solution from https://stackoverflow.com/a/10473026/246724
The author says that it will be outperformed by strncmp() - so..

if (preg_match('/^' . preg_quote($needle, '/') . '/', $haystack)) {..}
Clearly gonna be slower than other options.

As said, all these solutions do work, but they are either suboptimal,
or they add clutter and overhead, or feel a bit like mind acrobatics.

-----------------

So, I wonder if it would be worthwhile to add new functions
string_starts_with() / string_has_prefix(), and string_ends_with() /
string_has_suffix().

(Or maybe change strncmp(), so that the 3rd parameter $len is
optional. If $len is NULL / not provided, it would use the length of
the second (or first?) string.
(idea was that second parameter = needle).)

For me personally, I am sure that I would use a new
string_starts_with() a lot more often than a lot of the other existing
string functions.
I don't think it is an exotic or niche use case.

--------------

Spinning this further:
A lot of times if I want to check if $haystack begins with $needle, I
will then need the rest of the string after $needle.
So
if (string_starts_with($haystack, $needle)) {
$suffix = substr($haystack, strlen($needle));
}
or
if (string_ends_with($filename, '.php')) {
$basename = substr($filename, 0, -4);
}

I wonder if this could be somehow combined.
E.g.
if (FALSE !== $basename = string_clip_suffix($filename, '.php')) {
// Do something with $basename.
}

------------------

One flaw of these new functions would be that they are less versatile
than other string functions.
They solve this problem, and nothing else.
On the other hand, this is the point, to avoid unnecessary overhead.

The other problem would be, of course, "feature creep" aka "we have so
many string functions already".
This is a matter of opinion.
I would imagine the "cost" of new native functions is:
- global namespace pollution
- increased mental load to learn and remember all of them
- higher memory footprint of php engine?
- more C code to maintain
- a new doc page.
Did I miss something?

------------------

-- Andreas

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Michał Brzuchalski
Re: [PHP-DEV] New functions: string_starts_with(), string_ends_with()
August 01, 2017 08:40AM
Hi Andreas,

2017-08-01 6:57 GMT+02:00 Andreas Hennings <[email protected]>:

> Hello list,
> a quite common use case is that one needs to find out if a string
> $haystack begins or ends with another string $needle.
> Or in other words, if $needle is a prefix or a suffix of $haystack.
>
> One prominent example would be in PSR-4 or PSR-0 class loaders.
> Maybe the use case also occurs when writing parsers..
> In each of these two examples (parsers, class loaders), we care about
> performance.
>
> (forgive me if this was discussed before, I did not find it anywhere
> in the archives)
>
> --------------------------
>
> Existing solutions to this problem feel non-trivial, and/or are
> suboptimal in performance.
> https://stackoverflow.com/questions/2790899/how-to-
> check-if-a-string-starts-with-a-specified-string
> https://stackoverflow.com/questions/834303/startswith-
> and-endswith-functions-in-php
> This answer compares different solutions,
> https://stackoverflow.com/a/7168986/246724
>
> Existing solutions:
> (Let's focus on string_starts_with(), the other case is mostly
> equivalent / symmetric)
>
> if (0 === strpos($haystack, $needle)) {..}
> I have often seen this presented as the preferable solution.
> Unfortunately, this searches the entire string, not just the
> beginning. Especially if $haystack is really long, this can be a
> waste.
> E.g. if (0 === strpos(file_get_contents('some_source_file.php'),
> '<?php')) {..} will search the entire file for an occurence of
> '<?php'.
>
> if ($needle === substr($haystack, 0, strlen($needle))) {..}
> This reserves new memory for the substring, which later needs to be
> garbage-collected.
> Also, this requires an additional function call to strlen() - which
> adds even more clutter if $needle is an expression, not just a
> variable.
>
> if (0 === strncmp($haystack, $needle, strlen($needle))) {..}
> Needs the additional call to strlen().
> Otherwise, this seems like a really good solution.
>
> if ('' === $needle || false !== strrpos($haystack, $needle,
> -strlen($haystack))) {..}
> This is the funky solution from https://stackoverflow.com/a/
> 10473026/246724
> The author says that it will be outperformed by strncmp() - so..
>
> if (preg_match('/^' . preg_quote($needle, '/') . '/', $haystack)) {..}
> Clearly gonna be slower than other options.
>
> As said, all these solutions do work, but they are either suboptimal,
> or they add clutter and overhead, or feel a bit like mind acrobatics.
>
> -----------------
>
> So, I wonder if it would be worthwhile to add new functions
> string_starts_with() / string_has_prefix(), and string_ends_with() /
> string_has_suffix().
>
> (Or maybe change strncmp(), so that the 3rd parameter $len is
> optional. If $len is NULL / not provided, it would use the length of
> the second (or first?) string.
> (idea was that second parameter = needle).)
>
> For me personally, I am sure that I would use a new
> string_starts_with() a lot more often than a lot of the other existing
> string functions.
> I don't think it is an exotic or niche use case.
>
> --------------
>
> Spinning this further:
> A lot of times if I want to check if $haystack begins with $needle, I
> will then need the rest of the string after $needle.
> So
> if (string_starts_with($haystack, $needle)) {
> $suffix = substr($haystack, strlen($needle));
> }
> or
> if (string_ends_with($filename, '.php')) {
> $basename = substr($filename, 0, -4);
> }
>
> I wonder if this could be somehow combined.
> E.g.
> if (FALSE !== $basename = string_clip_suffix($filename, '.php')) {
> // Do something with $basename.
> }
>
> ------------------
>
> One flaw of these new functions would be that they are less versatile
> than other string functions.
> They solve this problem, and nothing else.
> On the other hand, this is the point, to avoid unnecessary overhead.
>
> The other problem would be, of course, "feature creep" aka "we have so
> many string functions already".
> This is a matter of opinion.
> I would imagine the "cost" of new native functions is:
> - global namespace pollution
> - increased mental load to learn and remember all of them
> - higher memory footprint of php engine?
> - more C code to maintain
> - a new doc page.
> Did I miss something?
>
> ------------------
>
> -- Andreas
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
This idea was discussed 11 months ago https://externals.io/message/94787
There is also a proper RFC
https://wiki.php.net/rfc/add_str_begin_and_end_functions
You might wanna contact with Will to get feedback from the idea.

--
regards / pozdrawiam,
--
Michał Brzuchalski
about.me/brzuchal
brzuchalski.com
Thanks!
I did not find those, maybe the emails need to be enriched with keywords.
Like SEO-aware email authoring.

Ok. I am looking at the RFC and the old discussions at
https://marc.info/?l=php-internals&m=147017797404339&w=2

I don't know how to follow up on old threads that I don't have in my
email inbox.


So here is my feedback.

The RFC seems mostly fine as it is.
It does not contain anything like the string_clip_suffix() /
string_clip_prefix(), but I think these should be discussed
separately.

About the naming:
The "i" in str_ibegin and str_iend() seems ok to me.
I also strongly support separate functions instead of a parameter for
case sensitivity.

I also support the underscore. str_begin() is better than strbegin().


------------------------

Whether to have an "s" at the end:

https://marc.info/?l=php-internals&m=147017797404339&w=2
(Yasuo Ohgaki)

> It might be okay to have "s" in function names, but if we want to be
> consistent,
> str_replace -> str_replaces
> str_ireplace -> str_ireplaces

I disagree with this analogy.

The "s" in str_begins() would be for "haystack beginS with needle".
An "s" in str_replaces() would stand for what?
Both "begin" and "replace" are verbs, but they have a different role
in the function name.
"begin" describes a state or condition we want to verfiy, whereas
"replace" is a command we give to the machine.

So to me it would make sense to have str_begins() and str_ends()
instead of str_begin() and str_end().

To me, str_end() means either "End the string!" (command) or "Give me
the end of the string!" (noun).

In fact Rowan Collins made the same argument here,
https://marc.info/?l=php-internals&m=147017844704431&w=2

> I think those names mean something different: "str_begin" sounds like an
> imperative "make this string begin with X"; "str_begins" is more of an
> assertion "the string begins with X". Ruby would spell it with a ? at
> the end. It's also the same form, grammatically, as the common "isFoo".
>
> Note that this logic holds for "str_replace", which *is* an imperative -
> you are not saying "tell me if X replaces Y", you are saying "please
> replace X with Y".

But then Will talks about consistency again.
https://marc.info/?l=php-internals&m=147018700406320&w=2

> I think like
> having an "s" at the end of the function names reads better, but
> omitting the "s" fits better with the existing function names and does
> not read bad. Therefore, I am in favor of dropping the "s".

Honestly, looking at the existing string functions at
http://php.net/manual/en/ref.strings.php
I don't see a lot of consistency here. Just a long list of garbled
abbreviations.

I also don't see any existing function where the verb has a similar
role as the "begin" in str_begin().
For all the existing string functions, the verb is a command.

I think a better comparison would be
file_exists()
function_exists()
class_exists()
is_subclass_of()
extension_loaded()
ncurses_has_colors()
ncurses_can_change_color()

What these functions have in common:
- The return value is boolean.
- The verb is not a command, but it describes a state or condition.

The verb is not always at the end of the function name, and it does
not always end with -s.
But the form and ending of the verb follows its grammatical role in
the sentence.

I think this is a much better guideline than following a wrong idea of
consistency.

-------------------------

Finally, I don't know why everything needs to be abbreviated.
Having str_* instead of string_* seems ok to me, and is consistent
with existing string functions.
But my first idea would have been more complete phrases like
str_ends_with, str_has_ending(), str_has_suffix(). Instead of just
str_end(), or str_ends().

On the other hand, shorter function names have their benefits. So.. no
strong opinion here.

--------------


-- Andreas






On Tue, Aug 1, 2017 at 8:29 AM, Michał Brzuchalski
<[email protected]> wrote:
> Hi Andreas,
>
> 2017-08-01 6:57 GMT+02:00 Andreas Hennings <[email protected]>:
>>
>> Hello list,
>> a quite common use case is that one needs to find out if a string
>> $haystack begins or ends with another string $needle.
>> Or in other words, if $needle is a prefix or a suffix of $haystack.
>>
>> One prominent example would be in PSR-4 or PSR-0 class loaders.
>> Maybe the use case also occurs when writing parsers..
>> In each of these two examples (parsers, class loaders), we care about
>> performance.
>>
>> (forgive me if this was discussed before, I did not find it anywhere
>> in the archives)
>>
>> --------------------------
>>
>> Existing solutions to this problem feel non-trivial, and/or are
>> suboptimal in performance.
>>
>> https://stackoverflow.com/questions/2790899/how-to-check-if-a-string-starts-with-a-specified-string
>>
>> https://stackoverflow.com/questions/834303/startswith-and-endswith-functions-in-php
>> This answer compares different solutions,
>> https://stackoverflow.com/a/7168986/246724
>>
>> Existing solutions:
>> (Let's focus on string_starts_with(), the other case is mostly
>> equivalent / symmetric)
>>
>> if (0 === strpos($haystack, $needle)) {..}
>> I have often seen this presented as the preferable solution.
>> Unfortunately, this searches the entire string, not just the
>> beginning. Especially if $haystack is really long, this can be a
>> waste.
>> E.g. if (0 === strpos(file_get_contents('some_source_file.php'),
>> '<?php')) {..} will search the entire file for an occurence of
>> '<?php'.
>>
>> if ($needle === substr($haystack, 0, strlen($needle))) {..}
>> This reserves new memory for the substring, which later needs to be
>> garbage-collected.
>> Also, this requires an additional function call to strlen() - which
>> adds even more clutter if $needle is an expression, not just a
>> variable.
>>
>> if (0 === strncmp($haystack, $needle, strlen($needle))) {..}
>> Needs the additional call to strlen().
>> Otherwise, this seems like a really good solution.
>>
>> if ('' === $needle || false !== strrpos($haystack, $needle,
>> -strlen($haystack))) {..}
>> This is the funky solution from
>> https://stackoverflow.com/a/10473026/246724
>> The author says that it will be outperformed by strncmp() - so..
>>
>> if (preg_match('/^' . preg_quote($needle, '/') . '/', $haystack)) {..}
>> Clearly gonna be slower than other options.
>>
>> As said, all these solutions do work, but they are either suboptimal,
>> or they add clutter and overhead, or feel a bit like mind acrobatics.
>>
>> -----------------
>>
>> So, I wonder if it would be worthwhile to add new functions
>> string_starts_with() / string_has_prefix(), and string_ends_with() /
>> string_has_suffix().
>>
>> (Or maybe change strncmp(), so that the 3rd parameter $len is
>> optional. If $len is NULL / not provided, it would use the length of
>> the second (or first?) string.
>> (idea was that second parameter = needle).)
>>
>> For me personally, I am sure that I would use a new
>> string_starts_with() a lot more often than a lot of the other existing
>> string functions.
>> I don't think it is an exotic or niche use case.
>>
>> --------------
>>
>> Spinning this further:
>> A lot of times if I want to check if $haystack begins with $needle, I
>> will then need the rest of the string after $needle.
>> So
>> if (string_starts_with($haystack, $needle)) {
>> $suffix = substr($haystack, strlen($needle));
>> }
>> or
>> if (string_ends_with($filename, '.php')) {
>> $basename = substr($filename, 0, -4);
>> }
>>
>> I wonder if this could be somehow combined.
>> E.g.
>> if (FALSE !== $basename = string_clip_suffix($filename, '.php')) {
>> // Do something with $basename.
>> }
>>
>> ------------------
>>
>> One flaw of these new functions would be that they are less versatile
>> than other string functions.
>> They solve this problem, and nothing else.
>> On the other hand, this is the point, to avoid unnecessary overhead.
>>
>> The other problem would be, of course, "feature creep" aka "we have so
>> many string functions already".
>> This is a matter of opinion.
>> I would imagine the "cost" of new native functions is:
>> - global namespace pollution
>> - increased mental load to learn and remember all of them
>> - higher memory footprint of php engine?
>> - more C code to maintain
>> - a new doc page.
>> Did I miss something?
>>
>> ------------------
>>
>> -- Andreas
>>
>> --
>> PHP Internals - PHP Runtime Development Mailing List
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>
>
> This idea was discussed 11 months ago https://externals.io/message/94787
> There is also a proper RFC
> https://wiki.php.net/rfc/add_str_begin_and_end_functions
> You might wanna contact with Will to get feedback from the idea.
>
> --
> regards / pozdrawiam,
> --
> Michał Brzuchalski
> about.me/brzuchal
> brzuchalski.com

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Sorry, only registered users may post in this forum.

Click here to login