Skip to content

Add Uri\WhatWg classes to ext/uri #18672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 10, 2025
Merged

Add Uri\WhatWg classes to ext/uri #18672

merged 15 commits into from
Jun 10, 2025

Conversation

kocsismate
Copy link
Member

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments regarding the tests.

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some first comments before ending my day. I'm having some troubles with the amount of “indirection” / “function pointers” handling things in a very abstract fashion that I'm not sure is always necessary.

Comment on lines 140 to 146
if (property_handler->write_func(new_internal_uri, property_zv, &errors) == FAILURE) { \
throw_invalid_uri_exception(new_internal_uri->handler, &errors); \
zval_ptr_dtor(&errors); \
zend_object_release(new_object); \
zval_ptr_dtor(property_zv); \
RETURN_THROWS(); \
} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner if the write_func() would throw the exception by itself. This gives it best control over the exception data.

If necessary a silent parameter could be added, but extensions are already able to suppress any exceptions if they are undesired, so this is probably not necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to throw in the handlers themselves because of the internal API: different kinds of exceptions/errors may be used in different contexts, especially when code needs to be backward compatible. For example SOAP uses a SoapFault with the "Unable to parse URL" message (https://github.com/php/php-src/blob/cf0c982b0ba937f4b94923c8fe4d541f0181a283/ext/soap/php_http.c#L467) to indicate an error, while the FILTER_VALIDATE_URL filter uses the false return value etc. Another example from the code of the RFC itself is the parse_uri handler that you spotted below: it should generally throw an InvalidUriException, except when it's used during unserialization.

That's why my reasoning is that handlers should be free of any exception throwing, and the caller should decide based on the context what to do with the error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why my reasoning is that handlers should be free of any exception throwing, and the caller should decide based on the context what to do with the error.

When internal code throws an exception while an Exception is already active, it will automatically set the $previous value to the active exception. This matches the:

It MUST set the $previous property to the original exception when doing so.

rule from the Throwable policy RFC (https://externals.io/message/127340) that is very likely to pass. Thus it is super easy to wrap the exception during unserialization (and probably also in Soap). You can see this pattern in action at:

if (FAILURE == php_random_bytes_throw(ZSTR_VAL(buffer), ZSTR_LEN(buffer))) {
zend_value_error("Unable to generate salt");
zend_string_release_ex(buffer, 0);
return NULL;
}

Where the Random\RandomException will be wrapped into the ValueError.


Generally speaking, APIs that do not throw an Exception on error should become more rare going forward, thus throwing an exception is the "standard case". In case an exception is absolutely undesired either a silent parameter would work or alternatively the exception can be suppressed with zend_clear_exception() after extracting the message.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round, didn't check the tests yet

@kocsismate
Copy link
Member Author

I addressed most review comments (that I marked with 👀 ). I'll continue tomorrow with the rest of the comments.

Niels, please let me know if you'd prefer to resolve your comments yourself or I should do it. I'll resolve the obvious ones for Tim tomorrow.

@nielsdos
Copy link
Member

Niels, please let me know if you'd prefer to resolve your comments yourself or I should do it. I'll resolve the obvious ones for Tim tomorrow.

It's fine by me if you resolve them. If you'd like some extra hands helping with the coding just hmu.

@kocsismate
Copy link
Member Author

I'll push the fix for the test failures later today

@TimWolla TimWolla self-requested a review June 2, 2025 18:49
@nielsdos
Copy link
Member

nielsdos commented Jun 5, 2025

I don't immediately spot something that still needs changing.
I just wonder whether it makes sense to have the lexbor parser structure as a normal variable instead of as a pointer: https://gist.github.com/nielsdos/2e9e46ca83e3c16f6a014598c84f21f2 . That makes more sense to me as it doesn't need to be on the heap per se.

@kocsismate
Copy link
Member Author

https://gist.github.com/nielsdos/2e9e46ca83e3c16f6a014598c84f21f2 . That makes more sense to me as it doesn't need to be on the heap per se.

This makes sense to me :)

I don't immediately spot something that still needs changing.

Yes, I would also prefer to merge it soon, so that the rest of the changes can also be continued (and of course this code can still be fine tuned later). First, I'd probably create a PR for the uriparser stuffs without the write handlers, than probably the internal API related code.

I'm still thinking about Tim's suggestion above (#18672 (comment)). I'm currently playing with the idea. I like the fact that the exception handler wouldn't be needed anymore, but in the same time, all the write handlers should also take care of throwing the exception themselves.

@nielsdos
Copy link
Member

nielsdos commented Jun 5, 2025

I'm still thinking about Tim's suggestion above (#18672 (comment)). I'm currently playing with the idea. I like the fact that the exception handler wouldn't be needed anymore, but in the same time, all the write handlers should also take care of throwing the exception themselves.

I don't have strong preference for either design decision.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay for me.
Please add an entry to UPGRADING.INTERNALS for the zend_exceptions.c zend_update_exception_properties API addition

@kocsismate kocsismate force-pushed the ext-url5 branch 2 times, most recently from 22fef8b to 42e60da Compare June 7, 2025 21:25
@TimWolla TimWolla self-requested a review June 7, 2025 21:36
@TimWolla

This comment was marked as outdated.

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still some bits that look like unnecessary indirection / premature abstraction to me, but I'm good with merging this and then following up with clean-ups once the RFC 3986 changes are also included (perhaps those prove me wrong). Perhaps it would make sense to document the internal API as “not yet stable” for the PHP 8.5 cycle so that external URI handler implementations are aware that further clean-up might happen for 8.6?

I did not look into the tests, I trust that they are reasonable. I'd like to see the suggested cleanup after the RFC 3986 changes are included as well, though, since that will simplify future maintenance for bug-fixes: #18672 (review)

@kocsismate kocsismate merged commit 3399235 into php:master Jun 10, 2025
1 check passed
@kocsismate kocsismate deleted the ext-url5 branch June 10, 2025 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants