Re: [RFC brainstorm] Approximately equals operator

From: Date: Mon, 31 Mar 2025 22:14:48 +0000
Subject: Re: [RFC brainstorm] Approximately equals operator
References: 1  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
On 31/03/2025 23:03, Niels Dossche wrote:
Hi internals! I'm excited to share what I've been working on! I had an epiphany. I realized what we truly need to revolutionize PHP: a new operator. Hear me out. We live in an imperfect world, and we often approximate data, but neither == nor === are ideal comparison operators to deal with these kinds of data. Introducing: the "approximately equal" (or "approx-equal") operator ~= (to immitate the maths symbol ≃). This combines the power of type coercion with approximating equality. Who cares if things are actually equal, close enough amirite? First of all, if $a == $b holds, then $a ~= $b obviously. The true power lies where the data is not exactly the same, but "close enough"! Here are some examples: We all had situations where we wanted to compare two floating point numbers and it turns out that due to the non-exact representation, seemingly-equal numbers don't match! Gone are those days because the ~= operator nicely rounds the numbers for you before comparing them. This also means that the "Fundamental Theorem of Engineering" now holds! i.e. 2.7 ~= 3 and 3.14 ~= 3. Of course also 2.7 ~= 3.14. But this is false obviously: 2 ~= 1. Ever had trouble with users mistyping something? Say no more! "This is a tpyo" ~= "This is a typo". It's typo-resistant! However, if the strings are too different, then they're not approx-equal. For example: "vanilla" ~= "strawberry" gives false. How does this work? * The strings are equal if their levenshtein ratio is <= 50%, so it's adaptive to the length. * If the ratio is > 50%, then the shortest string comes first in the comparison, such that if we ever get a ~< operator, then "vanilla" ~< "strawberry". There is of course a PoC implementation available at: https://github.com/php/php-src/pull/18214 You can see more examples on GitHub in the tests, here is a copy:
// Number compares
var_dump(2 ~= 1); // false
var_dump(1.4 ~= 1); // true
var_dump(-1.4 ~= -1); // true
var_dump(-1.5 ~= -1.8); // true
var_dump(random_int(1, 1) ~= 1.1); // true

// Array compares (just compares the lengths)
var_dump([1, 2, 3] ~= [2, 3, 4]); // true
var_dump([1, 2, 3] ~= [2, 3, 4, 5]); // false

// String / string compares
var_dump("This is a tpyo" ~= "This is a typo"); // true
var_dump("something" ~= "different"); // false
var_dump("Wtf bro" ~= "Wtf sis"); // true

// String / different type compares
var_dump(-1.5 ~= "-1.a"); // true
var_dump(-1.5 ~= "-1.aaaaaaa"); // false
var_dump(NULL ~= "blablabla"); // false
Note that this does not support all possible Opcache optimizations _yet_, nor does it support the JIT yet. However, there are no real blockers to add support for that. I look forward to hearing you! Have a nice first day of the month ;) Kind regards Niels
For the float case it's fine (because Epsilon is well defined), but I think overloading for the string case is not fine, because the hard-coded 50% distance is subjective and users may well want to configure that, so an operator is thus not suitable, notwithstanding Levenshtein has very limited application. If there is any sense in doing string comparisons with this operator, I think the proposed case is not it. The array case is also not good in my view, where you're just comparing length; I see no use for that whatsoever. What it _should_ do instead is compare where order is indistinct, i.e. [1, 2, 3] ~= [3, 2, 1], similar to PHPUnit's assertEqualsCanonicalizing [1]. Cheers, Bilge [1]: https://github.com/sebastianbergmann/comparator/blob/d67eceae47e3956aa28ab0c6e43e5a6765f45779/src/ArrayComparator.php#L43-L46

Thread (17 messages)

« previous php.internals (#126990) next »