Re: Adding in a case-insensitive version of str_contains

From: Date: Sun, 01 Jun 2025 02:55:18 +0000
Subject: Re: Adding in a case-insensitive version of str_contains
References: 1 2 3 4 5  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
2025年6月1日(日) 1:07 Derick Rethans <[email protected]>:
>
> On 31 May 2025 12:36:52 BST, youkidearitai <[email protected]> wrote:
> >2025年5月31日(土) 19:41 Nikita Popov <[email protected]>:
> >>
> >> On Thu, May 29, 2025, at 23:00, Kamil Tekiela wrote:
> >>
> >> As I understand, it was a conscious decision not to add this function
> >> when str_contains was created. The reason is that case sensitivity is
> >> locale-dependent, and for such use cases, mbstring extension is better
> >> [1] & [2]. Do you think that locale is a concern here, and if not,
> >> why? Would it be a good idea to add mb_str_icontains instead?
> >>
> >> If you're going to propose an RFC for this, it would be a good idea to
> >> explain what the real life use case for it is. While str_contains is
> >> very useful for checking the existence of a byte-string within another
> >> byte-string, a case-sensitive check doesn't seem to have much use.
> >>
> >> [1]: https://stackoverflow.com/a/63121809/1839439
> >> [2]: https://wiki.php.net/rfc/str_contains#case-insensitivity_and_multibyte_strings
> >>
> >>
> >> To make it a bit more explicit: The proposed str_icontains function does not support
> >> UTF-8. It would only be case-insensitive on ASCII characters.. Do we really want to add new
> >> functions that do not properly handle UTF-8?
> >>
> >> I think that thanks to https://wiki.php.net/rfc/strtolower-ascii
> >> (which removed C locale support from this family of functions), there actually is a pretty viable
> >> way forward to make the non-mbstring case-insensitive string functions useful again: Make them work
> >> on UTF-8. (In the sense of using Unicode case folding and case mapping on UTF-8, while still
> >> returning code unit offsets. This would make them superior to both the current stri* functions, and
> >> the mb_stri* functions.)
> >>
> >
> >I agree that it's important to think about it in UTF-8.
> >
> >I think about UTF-8 support case folding function in past few days.
> >Maybe... It is like below?
> >
> >```
> >grapheme_setlocale($locale);
> >grapheme_icontains($haystack, $needle);
> >```
> >
> >First, grapheme_* function supports locale.
> >Second, add grapheme_icontains function for case insensitive version
> >for str_contains. .
>
> I don't think it's a good idea to rely on a global state containing the locale.
>
> cheers
> Derick

Hi, Derick (and Internals)

Thank you your feedback.
Well, Then I could find two ways.

First, grapheme_* functions add $locale parameter. For example in
grapheme_strpos.

```
 grapheme_strpos(string $haystack, string $needle, int $offset = 0,
string $locale): int|false
```

Second, Contain a locale in object instance (But I can't find just object).

By the way, intl is already exists Locale class.
https://www.php.net/manual/en/class.locale.php
But, it is not seems use anymore.

Regards
Yuya

-- 
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- https://github.com/youkidearitai
-----------------------------


Thread (10 messages)

« previous php.internals (#127519) next »