On Monday, 8 July 2024 at 04:04, Juliette Reinders Folmer <[email protected]>
wrote:
> On 2-7-2024 20:05, Gina P. Banyard wrote:
>> On Tuesday, 2 July 2024 at 10:52, Juliette Reinders Folmer
>> <[email protected]> wrote:
>>
>>> * While a number of proposals include an impact analysis (thank you!), a significant
>>> number of the proposals don't.
>>> It would be appreciated if for those proposals which aren't removing
>>> unused/unusable functionality, some sort of impact analysis was added.
>>
>> You will need to clarify which ones you are talking about.
>> These "bulk removal" RFCs are written by various authors over the course of a
>> year, and might not have been looked at for 9+ months.
>
> I'd suggest for an impact analysis/expected impact statement to be added to the following
> deprecation proposals:
I am going to start this reply with the following:
An impact analysis showing a large impact to userland, is not in itself, an argument against a
deprecation.
What an impact analysis helps to determine is the length of the deprecation and the timeline for
removal.
It is getting exhausting to need to provide this, when what it is, is me asking Damien to check
usage on the corpus of over 3100 projects, some open source (such as Wordpress, Drupal, OSS
accounting software, etc.), top 1000 composer packages, and the private codebases he has access via
his company, using his Exakat static analysis tool. [1]
The corpus is 160 MLOC (1.2 Billion tokens), 1.4 M files and as already mentioned over 3100 distinct
projets.
But his tool will sometimes report duplicates, and has outdated versions which might not be affected
by the issues anymore.
One reason is that some projects inline composer dependencies, and unless I do a painstaking manual
review I cannot narrow this down.
Especially as it takes time to run the analysis on the corpus, and if I don't ask the precise
question I don't get all the relevant stats.
So every stat is a conservative approximation.
We don't decide to deprecate and remove things for the fun of it.
But if something is misleading, badly designed, dangerous, has a security risk, or causes issues it
should be deprecated.
It is my belief that it does not matter if this affects 10, 10 000, or 10 000 000 codebases.
However, how and when we remove this, yes this is affected by the usage.
> * session.sid_length and session.sid_bits_per_character
Auditing INI setting usages is effectively impossible with Exakat.
Misusing these settings can lead to security issues,
and the new values will match the existing defaults.
I would guess that the majority of users don't even know about this setting and thus are not
affected.
Similarly, it seems likely that application developers are also not aware of it,
causing applications to break if a hosting provider would adjust these settings.
For example: if the application expects it to be a specific format, which is defined in the database
schema.
Considering the above, these INI settings should be removed and deprecated, regardless of impact.
> * xml_set_object() and xml_set_*_handler() with string method names
This behaviour is unintuitive and breaks all usual language semantics.
This should be deprecated and removed regardless of impact.
But when I was working on this I had asked Damien to run some analysis with Exakat and found 66
projects.
To which I have sent PRs to some to remove the usage, which is extemely simple to do.
To clarify the rationale, the following code is ambiguous:
xml_set_element_handler($p, 'strrev')
It either calls the \strrev() string function, or a method called strrev
on the object
provided by a call to xml_set_object()
.
This is going to be the logic as of PHP 8.4 after some refactoring I did last October.
In the current released versions it is even more ambiguous, as the object provided by
xml_set_object() could be passed *after* setting the string callable.
This behaviour is totally unintuitive, so regardless of the impact it should be removed.
> * Deprecate proprietary CSV escaping mechanism
This is a follow-up on an RFC whose first step was implemented in PHP 7.4.
(https://wiki.php.net/rfc/kill-csv-escaping)
The first step was implemented (https://github.com/php/php-src/pull/3515) without a vote being held
following the discusion on internals: https://externals.io/message/103268
This routinely bites people, and we still get issues about people being confused about this
parameter.
We really should address this, and not wait yet another 5 years for complaints to once again be
raised before we take any action.
> * Deprecate strtok() function
Symfony agrees with the rationale provided by the RFC and has banned the function from their
project: https://github.com/symfony/symfony/issues/57542
This seems to indicate that the rationale around it is sound.
But just for the sake of it, I asked Damien, and I don't have the total number of usages, just
that at least one call to strtok is made in 274 projects.
I have no idea which projects, and whether the majority of them are in a bunch of libraries or not.
> * Deprecate returning non-strings values from a user output handler
> * Deprecate producing output in a user output handler
This is hard to analyse as it depends on runtime execution.
However, the current behaviour when doing one of these things is questionable and/or broken.
And I firmly believe this should be deprecated/changed regardless of impact.
> * Deprecate mysqli_refresh()
> * Deprecate mysqli_kill()
These are following upstream deprecations from MySQL.
> * Deprecate lcg_value()
This function is effectively broken.
Thus, I do not see what benefit we get from an impact analysis.
> * Deprecate md5(), sha1(), md5_file(), and sha1_file() (add an actual analysis, not just a
> statement as this is a high impact proposal)
To circle back to the beginning, what does a detailed analysis brings us here?
Tim is aware this has potential to impact a lot of code, which is why it is *explicitly* not being
slated for removal in 9.0, and would require a follow-up RFC to remove it.
Moreover, this is in the same vein as when we deprecated utf8_decode() and utf8_encode() in PHP 8.2:
https://wiki.php.net/rfc/remove_utf8_decode_and_utf8_encode
Tim slightly adjusted the wording of the RFC to make it clearer that the suggested replacements are
only intended for users that are locked into the algorithm choice.
I struggle to see a good reason to use MD5 in 2024, and I would hope that no-one uses MD5 to hash
passwords in 2024, but somehow I doubt that.
I'm also trusting Tim to implement the deprecation message, and the changelog entry on the
manual, in a way that prompts users to re-evaluate their choice of algorithm rather then blindly
using the hash extension with MD5/SHA1.
But I did ask Damien, and he has told me that for each function there are that many projects that
use the function at least once.
I don't have any idea if it stems from a library, if they only used it once or 10 000 times in
the project, nor for what purpose.
md5: 862
sha1: 495
sha1_file : 85
md5_file : 245
> * Deprecate passing E_USER_ERROR to trigger_error()
This is to limit usage and access to the bailout mechanism, and better alternatives exist and should
be used.
This is prime example of deprecations being the correct tool.
> * Deprecate SOAP_FUNCTIONS_ALL constant and passing it to SoapServer::addFunction()
To me, this is a security issue first and foremost, and therefore we should discourage its use and
remove it.
However, once again, I've asked Damien to run a quick analysis and 182 projects use it, mainly
Symfony and Drupal.
> And to a lesser degree for:
> * Formally deprecate Soft-deprecated DOMDocument and DOMEntity properties
We are following the DOM Spec here.
Thus I don't see how an impact analysis is useful.
> * Deprecate SplFixedArray::__wakeup()
This never worked properly.
Thus I don't see how an impact analysis is useful.
> * Deprecate passing null and false to dba_key_split()
This also never worked properly and is a bug.
Thus I don't see how an impact analysis is useful.
> * Deprecate passing incorrect data types for options to ext/hash functions
This is potential security issue, and only possible to know at runtime, so regardless of impact this
should be removed.
However, Niels did add such an analysis to the RFC.
> * Constants SUNFUNCS_RET_STRING, SUNFUNCS_RET_DOUBLE, SUNFUNCS_RET_TIMESTAMP
This is a follow-up on a deprecation enacted in PHP 8.1, and arguably should have been done at the
same time,
cf. https://wiki.php.net/rfc/deprecations_php_8_1#date_sunrise_and_date_sunset
> * Remove E_STRICT error level and deprecate E_STRICT constant
This error level had only 2 strange uses in PHP 7, and has been completely removed in PHP 8.
I don't see what benefit an impact analysis would bring here, we are just deprecating/removing
cruft at this point.
> * mysqli_ping() and mysqli::ping()
This is broken as of PHP 8.2.
Thus I don't see how an impact analysis is useful.
> P.S.: typo in "xml_set_object() and xml_set_*_handler() with string method names":
> "witch" => "which"
Fixed
>>> Other than that, I join the previously voiced objections to the deprecation of
>>> uniqid()
, md5()
, sha1()
, md5_file()
,
>>> sha1_file()
.
>>> While I acknowledge that these functions _can_ be used inappropriately for
>>> security-sensitive code, which should use alternative methods, these functions have perfectly valid
>>> use-cases for non-security-sensitive code and the impact of the BC-break of deprecating and
>>> eventually removing these methods can, IMO, not be justified.
>>> Keep in mind that while "we" know and understand that deprecations are not
>>> errors, end-users often don't and particularly for open source projects, this means that in
>>> practice these deprecations will need to be addressed anyway to reduce the noise of users opening
>>> issues about them, which without a clear path to removal of the functions, will, in a lot of cases,
>>> mean adding the @
operator to all uses.
>>
>> If I may be a bit cheeky, if we consider that userland does not understand that
>> deprecations are not errors, how can we trust them to use the 5 aforementioned functions correctly?
>> Especially as there are more appropriate replacements available.
>
> There is a difference between "userland" (dev-users) and end-users. I was talking
> about end-users, while based on your remark, you are talking about dev-users.
I am unsure what you mean by "end-users" here, I am going to assume you mean PHP
developers that write PHP code using PHP libraries and/or frameworks.
Because if you refer to "end-users" as people that install WordPress (or whatever PHP
application) via something like CPanel, this is a totally different conversation.
I sincerely appreciate that you are very much a tooling and library ecosystem developer, but from a
core developer PoV that someone is an "end-user" or a "dev-user" is not a
practical distinction.
We cannot make a function available only to "dev-users" that know what they do, we need to
consider the whole userbase, and arguably end-users are the largest proportion of this.
Thus if end-users do not understand that deprecations are not errors, how should I expect end-users
to read the documentation?
A deprecation is loud and clear, the documentation can be easily ignored, and there is no way to
verify if the aforementioned functions are used correctly.
I will add, I very much dislike the argument "but it is clearly explained on the documentation,
so it is not a problem".
I do not know about most people, but frankly, I do not look up the documentation everytime I want to
use a function, similarly to me not having consulted a dictionary to verify the meaning of the words
I'm using before typing them.
In the same way that human languages will "deprecate" words by discouraging their usage,
we ought to do the same for the language we write our code in.
If the issue is that you, as a maintainer, don't have a page on php.net to point to users where
we clearly say "Promoting deprecations to Exceptions is wrong and bad" then we can make
such a page.
Tim made a PR, which is now live [2], to the ErrorException page changing the example to a correct
error handler promoting warnings and notices to exceptions but not deprecations.
However, if even creating such a page is not enough, then maybe we need to do some engine level
changes where we properly split out deprecations from the other diagnostics being emitted,
so that it becomes *impossible* for people to promote deprecations to exceptions (which somehow
I'm thinking that people like Marco and Nicolas would appreciate).
These are all conversations that we can have.
However, "stop deprecating things" is not a "solution" to people not
understanding what deprecations are.
This has come up again and again, and the answer has been constant, it is unacceptable to tell the
project to not deprecate things.
It is one of our limited tools to actually make changes to the language, removing this is not an
option.
Because if we do remove this option, I can definitely see people starting to create their own
flavours of PHP to fix stuff that apparently must be set in stone in the official language.
Which I don't think many people want to see this.
And, considering that most of the deprecations are in *extensions* this is, yet again, reinforcing
my opinion that we should unbundle all extensions so that they can move at their own pace and that
users can install whatever version of said extension they want.
Moreover, if you permit me to do an aside from another industry.
The construction industry in Europe is going through a massive overhaul via the second generation of
Eurocodes. [3]
All of the final drafts were submitted prior to October of 2023, and will be finalized, translated
into German/French, and voted on prior to 30 March 2026.
These new standards will then be implemented at a national level by all 34 members of CEN [4] via
their national standard body (e.g. the British Standard Institute for the UK) by 30 September 2027.
Finally, the previous versions of the standard *must* be withdrawn by 30 March 2028.
Therefore, if the final version of a standard is published at the latest possible time, there is at
most a two year transition period for a large part of an *industry* and, at minimum, 34 national
standardization bodies to adopt the new standard, deprecate and withdraw the old one.
It is possible to use the new Eurocodes prior to them becoming mandated at the national level (which
the member countries of CEN are obligated to do), but once that step is taken using the previous
Eurocodes will require a legal exemption.
Meanwhile, PHP, a programming language that represents a fraction of the software industry, and does
not need to deal with any legally binding system whatsoever, provides longer time frames, and yet
this is not enough?
There is no legal requirement for any project to need to use the latest version of PHP, and if you
fancy it, you could create a new project written in PHP 5.2 today if you wanted.
Maybe PHP and its users aren't able to cause the same level of damage as a bridge collapsing by
letting potential security problems go unresolved, but that doesn't mean we can't learn a
lesson from "real engineering" that benefits the project.
> I also don't agree that there are "more appropriate replacements available".
> The suggested hash()
replacements for the md5/sha1* functions have the exact same
> functionality, which the RFC considers "incorrect use", so what are we actually solving by
> this deprecation ? Devs not having enough to do already ?
> The problem (for open source) with "force-replacing" the uses of
> md5/sha1*
functions with the hash
function calls, is that the hash
> extension was not part of PHP core until PHP 7.4, which means that for a significant number of open
> source projects, the replacement is not a one-on-one function call replacement, but needs guard code
> for PHP < 7.4 in case the hash extension is not available.
Reiterating what I said previously, replacing it with the one-to-one equivalent should only be done
if you truly need those specific algorithms.
Otherwise its usage should be reconsidered depending on the requirements and switched to something
"safer".
Hopefully this is clearer now that Tim amended the RFC.
I can understand that userland projects and end-users work on a broad range of versions, but it is
unreasonable to expect the PHP project to not do something because the situation used to be
different over 5 years ago.
Moreover, according to Tim, who used to work on a PHP application that people would install on
shared-hosting, most hosting companies actually have optional extensions enabled such as ext/hash,
ext/intl, ext/mbstring, or even ext/gmp.
So with all due respect, I do not think that ext/hash not being mandatory prior to PHP 7.4 is a good
counter argument.
Especially, as according to Packagist statistics, 93% of users use a version of PHP that is PHP 7.4
or above, and this percentage is only going to increase. [4]
> Also, having read through the RFC a second time, I find the voting choices inconsistent - in
> particular the first deprecation vote, which makes the others ambiguous.
> Could each voting choice please be explicitly one of the below to prevent any confusion ?
> * Remove in PHP 8.4
> * Deprecate in PHP 8.4 and remove in PHP 9
> * Deprecate in PHP 8.4 and remove at a later date after a separate vote
Unless specified otherwise, it is deprecate in 8.4 and remove in PHP 9, the other ones which specify
it is for process efficiency.
Best regards,
Gina P. Banyard
[1] https://www.exakat.io/en/
[2] https://www.php.net/manual/en/class.errorexception.php
[3] https://eurocodes.jrc.ec.europa.eu/second-generation-eurocodes
[4] https://standards.cencenelec.eu/dyn/www/f?p=CEN:5
[5] https://stitcher.io/blog/php-version-stats-july-2024