Hi Hammed, thank you for taking the time to read through this and share your thoughts.
> On Sep 19, 2024, at 1:41 PM, Hammed Ajao <[email protected]> wrote:
>
>
>
>
> On Tue, Sep 17, 2024 at 8:30 PM Dennis Snell <[email protected]
>> wrote:
>
>>
>>
>>
>>> On Sep 17, 2024, at 2:03 PM, Rob Landers <[email protected]> wrote:
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Sep 17, 2024, at 14:57, Adam Zielinski wrote:
>>>
>>>>> To summarize, I think PHP would benefit from:
>>>>
>>>>>
>>>>
>>>>> 1. Adding WASM for simple low-level extensibility that could run on
>>>>
>>>>> shared hosts for things that are just not possible in PHP as described a
>>>>
>>>>> few paragraphs prior, and where we could enhance functionality over time,
>>>>
>>>>>
>>>>
>>>>> 2. Constantly improving PHP the language, which is what you are solely
>>>>
>>>>> advocating for over extensibility,
>>>>
>>>> Hi Mike,
>>>>
>>>>
>>>>
>>>> I’m Adam, I'm building WordPress Playground [1] – it's WordPress
>>>> running in the browser via a WebAssembly PHP build [2]. I'm excited to see this discussion and
>>>> wanted to offer my perspective.
>>>>
>>>>
>>>>
>>>> WebAssembly support in PHP core would be a huge security and productivity
>>>> improvement for the PHP and WordPress communities.
>>>>
>>>>
>>>>
>>>>> To summarize, I think PHP would benefit from:
>>>>
>>>>>
>>>>
>>>>> 1. Adding WASM for simple low-level extensibility that could run on
>>>>
>>>>> shared hosts for things that are just not possible in PHP as described a
>>>>
>>>>> few paragraphs prior, and where we could enhance functionality over time,
>>>>
>>>>
>>>>
>>>> Exactly this! With WASM, WordPress would get access to fast, safe, and
>>>> battle-tested libraries.
>>>>
>>>>
>>>>
>>>> Today, we're recreating a lot of existing libraries just to be able to use
>>>> them in PHP, e.g. parsers for HTML [3], XML [4], Zip [5], MySQL [6], or an HTTP client [7]. There
>>>> are just no viable alternatives. Viable, as in working on all webhosts, having stellar compliance
>>>> with each format's specification, supporting stream parsing, and having low footprint. For
>>>> example, the curl PHP extensions is brilliant, but it's unavailable on many webhosts.
>>>>
>>>>
>>>>
>>>> With WebAssembly support, we could stop rewriting and start leaning on the popular
>>>> C, Rust, etc. libraries instead. Who knows, maybe we could even polyfill the missing PHP extensions?
>>>>
>>>>
>>>>
>>>>> 2. Constantly improving PHP the language, which is what you are solely
>>>>
>>>>> advocating for over extensibility,
>>>>
>>>>
>>>>
>>>> Just to add to that – I think WASM support is important for PHP to stay relevant.
>>>> There's an exponential advantage to building a library once and reusing it across the language
>>>> boundaries. A lot of companies is invested in PHP and that won't change in a day. However,
>>>> lacking access to the WASM ecosystem, I can easily imagine the ecosystem slowly gravitating towards
>>>> JavaScript, Python, Go, Rust, and other WASM-enabled languages.
>>>>
>>>>
>>>>
>>>> Security-wise, WebAssembly is Sandboxed and would enable safe processing of
>>>> untrusted files. Vulnerabilities like Zip slip [8] wouldn't affect a sandboxed filesystem.
>>>> Perhaps we could even create a secure enclave for running composer packages and WordPress plugins
>>>> without having to fully trust them.
>>>>
>>>>
>>>>
>>>> Another use-case is code reuse between JavaScript and PHP. I'm sceptical this
>>>> could work with reasonable speed and resource consumption, but let's assume for a moment there
>>>> is a ultra low overhead JavaScript runtime in WebAssembly. WordPress could have a consistent
>>>> templating language. PHP backend would render the website markup using the same templates and
>>>> libraries as the JavaScript frontend. Half the code would achieve the same task.
>>>>
>>>>
>>>>
>>>> Also, here's a few interesting "WASM in PHP" projects I found –
>>>> maybe they would be helpful:
>>>>
>>>> - WebAssembly runtime built in PHP (!) https://github.com/jasperweyne/unwasm
>>>>
>>>>
>>>> - WebAssembly runtime as a PHP language extension: https://github..com/veewee/ext-wasm
>>>>
>>>>
>>>> - WebAssembly runtime as a PHP language extension: https://github..com/extism/php-sdk
>>>>
>>>>
>>>>
>>>>
>>>> [1] https://github.com/WordPress/wordpress-playground/
>>>>
>>>>
>>>> [2] https://github.com/WordPress/wordpress-playground/tree/trunk/packages/php-wasm/compile
>>>>
>>>>
>>>> [3] https://developer.wordpress.org/reference/classes/wp_html_processor/
>>>>
>>>>
>>>> [4] https://github.com/WordPress/wordpress-develop/pull/6713
>>>>
>>>>
>>>> [5] https://github.com/WordPress/blueprints-library/blob/87afea1f9a244062a14aeff3949aae054bf74b70/src/WordPress/Zip/ZipStreamReader.php
>>>>
>>>>
>>>> [6] https://github.com/WordPress/sqlite-database-integration/pull/157
>>>>
>>>>
>>>> [7] https://github.com/WordPress/blueprints-library/blob/trunk/src/WordPress/AsyncHttp/Client.php
>>>>
>>>>
>>>> [8] https://security.snyk.io/research/zip-slip-vulnerability
>>>>
>>>>
>>>>
>>>>
>>>> -Adam
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> Hey Adam,
>>>
>>>
>>>
>>> I actually went down something like this road for a bit when working at Automattic. My
>>> repo even probably still exists in the graveyard repository… but I had plugins running in C# and
>>> Java over a couple of weeks. This was long before wasm was a thing, and while cool, nobody really
>>> could think of a use for it.
>>>
>>>
>>>
>>> It seems like you have a use for it though, and I’m reasonably certain you could get
>>> it working over ffi in a few weeks; yet you mention hosts not even having the curl extension
>>> installed, so I doubt that even if wasm came to be, it would be available on those hosts.
>>>
>>>
>>
>> There are two major areas I have found that would benefit from having a WASM runtime in
>> PHP:
>>
>>
>> Obviously, being able to run the same algorithms on the frontend and backend is a huge win
>> for consistency in applications.
>>
>>
>>
>
> I'm not convinced. That's what they said about nodejs(same algos and same language on
> FE and BE). Except it's not really that consistent because there are several discrepancies
> between the browser and node runtime. I'll believe it when I see it.
>
>
>
There’s a note about this point that I think is worth calling out, and that is something you
probably already know, but JavaScript runtimes provide a standard library while a WASM runtime is
mostly just a virtual machine. There’s also nothing provided that I’m aware of in WASM that
offers filesystem access or network access, which are major areas where in-browser JavaScript and
NodeJS backends differ (because the browser and server environments are fundamentally limited by
different needs).
As things stand, projects are compiled into WebAssembly and literally run identically in the
different runtimes because it’s the bytecode that’s specified, not specific functions or
libraries. Whereas with JavaScript we’re shipping source code and interacting with very different
systems, WASM bundles are a few steps removed from that, and have no DOM or system access to
interact with.
I’m fairly confident we can say that it’s non-controversial that folks routinely run identical
algorithms across different WASM runtimes in different environments. As you mentioned elsewhere,
it’s very much akin to how Java and Closure and Scala all run on the JVM just fine together even
being different languages, except in this case the runtime is an isolated sandbox by default with no
external system access. WASM is a lovely little VM, successful in ways many before it haven’t
been.
>
>> Particularly with text-related algorithms it’s really easy for inconsistencies to develop
>> due to the features available in each languages standard library, as well as due to differences in
>> how each language handles and processes string.
>>
>>
>>
>
> I can see the appeal of that though.
>
>
>>
>>
>> The other major area is similar, and we’ve seen this with the HTML and XML parsing work
>> recently undertaken in WordPress.
>>
>>
>>
>
>
>
> Yeah you could talk about html parsing before 8.4 but with 8.4 we get lexbor (thanks to niels)
> and that's as good as it gets. Php already has beautiful support for XML though so I'm not
> sure why you would implement a parser yourself.
>
It’s wonderful that PHP is finally getting a spec-compliant HTML DOM parser for the first time in
its history, but \Dom\HTMLDocument is not the right interface for every server need, and remains
ill-suited for the kind of work typical in a WordPress site, which needs to run on low memory
budgets, perform as fast as possible, and exceed the safety of what a generic DOM parser produces
(there are cases that \Dom\HTMLDocument will still introduce vulnerabilities into an HTML document
because it’s able to create DOM trees that cannot be represented by HTML upon serialization, and
as it implements the HTML spec, it cannot prevent creating those trees). There are still a number of
steps every developer needs to take to properly setup the parser and get the right results back, and
the parser has to load the entire DOM tree into memory before any reads or manipulations can be
performed on it.
WordPress’ HTML API is a near-zero memory overhead streaming HTML parser designed around
safe-by-default reading and writing of HTML which requires no configuration or manual steps to get
“the right thing.” It’s also significantly slower in user-space PHP than it needs to be. I
hope one day that PHP has its own copy of this streaming parser design, which is performant and
available in every copy of PHP (which is another issue with code only available in extensions), but
even if that never happens, running C or Rust code compiled to WebAssembly would provide almost the
same value as having that design implemented in the language.
>
>
>
>> There are plenty of cases where efficient and spec-compliant operations are valuable, but
>> these kinds of things tend to cost significantly more in user-space PHP.
>>
>>
>>
>
>> Being able to jump into WASM, even with the overhead of exchanging that data and running
>> another VM, would very likely result in a noticeable net improvement in runtime performance.
>>
>>
>>
>
>
>
> What exactly do you mean by jump into wasm? Like hand write it? Or you mean jump into a
> language that can be compiled to wasm? How about debugging at runtime? And if you mean better
> performance than PHP, while that is likely, it isn't guaranteed. PHP is pretty fast and will be
> faster for some routines that are optimized by the engine. Wasm will never be as fast as extensions
> though because with extensions, all you're doing is extending the engine. Same as any internal
> extension. With wasm you're interoperating with an entirely separate VM.
>
By jumping into WASM I’m talking about the second thing you mention: calling functions written in
languages compiled to WebAssembly. Even with the overhead of marshaling data, the things that
WebAssembly is good at are the things that PHP is slow at: specifically things like raw numeric
computation and string manipulation and parsing. I write a lot of parsing code and frequently am
surprised at the overhead cost of string processing and array operations in PHP. There are a number
of straightforward operations available in C that just can’t be done in PHP. I don’t see this as
a failing of PHP, just an aspect of how it is.
For runtime debugging I don’t have any particular thoughts. I’m not aware of anyone who has ever
tried to runtime debug CURL calls or things like mb_convert_encoding()
. Functions
invoked in the WASM runtime would more or less be library functions, like ffmpeg
.
Debugging would likely most frequently be done as a library and dumped into the PHP application with
no expectation for debugging.
Effectively these are user-space PHP extensions, and are very convenient because they can be updated
without recompiling PHP or begging web hosts to update their PHP version, or to do that every other
Tuesday, or whenever another security exploit is fixed in some image processor. On that note, the
ability to sandbox image processing code (and any other user-provided content) is a huge perk. Many
of the exploits of past PHP extensions could be contained inside the VM, which has limited ability
to reach out into the system. Fixing vulnerabilities and bugs becomes something any auto-updater can
accomplish, requiring no effort or interaction on the part of the host.
>
>
>> Additionally, it’s a perk being able to write such algorithms in languages that aid that
>> development through more powerful type systems.
>>
>
>
>
> We can agree on that. But I use C++ for my extensions so there's also that.
>
>
>> There’s additional value in a number of other separate tasks. Converting images or
>> generating thumbnails is a good example where raw performance is less of a concern than being able
>> to ensure that the image library is available and not exposing the host system to risk.
>>
>
>
>
> Imo this is where FFI should shine but I'll admit that the current implementation is
> lacking in both security and functionality.
>
>
>
>
>> I imagine plenty of “PHP lite-extensions” appearing in this space because it would give
>> people the opportunity to experiment with features that are impractical in user-space PHP before
>> fully committing the language itself to that interface or library. It would extend the reach of
>> PHP’s usability because it would make possible for folks, who happen to be running on cheap shared
>> hosts, to run more complicated processing tasks than are practical today. While big software shops
>> and SaaS vendors do and can run their own custom PHP extensions, there’s not great way to share
>> those generally to people without the same full control over their stack.
>>
>
>
>
> Shared hosting for php gets you the worst possible version of php.
>
Couldn’t have said it better myself!
> Can't recompile to enable any bundled extension, can't install any new extensions, so
> how exactly would you approach this? Wasm bundled with the engine by default? Or some kind of opt in
> mechanism that shared hosters won't even be able to use?
>
As with many of the things I’ve been writing on this list lately, to me, an embedded WASM runtime
makes most sense as a central language feature and available everywhere PHP is deployed. There are a
few core basic subsystems that either are foundational to the environment PHP operates in (for
example, web-related technologies like HTTP and HTML and URLs) or which bring so much value to the
language that it opens up brand new paradigms or potentially removes major maintenance burdens.
If we could ship imagemagick
as a WASM extension there would be no need for the
imagemagick
extension. The security environment out of the box is so much better;
it’s not worth the lost potential for performance that a native extension offers. Someone may not
agree with this, and that’s fine because they can always install a native extension or utilize the
FFI on infrastructure they control.
I think at times WordPress sees a very different picture of the world than many great PHP projects
see. Our reality is that we’re writing code that runs on hardware we don’t control or even know
about. We cannot in any way install or force certain extensions to be present. The worst possible
version of PHP is literally the constraint at which we are allowed to code. Anything beyond that and
we can’t ship it because a large fraction of the internet will start crashing. It’s frustrating,
but also an honor to be able to ensure that people who can’t afford high end servers can still
build their own place on the world wide web.
Over the past several years, though, WordPress has also been a positive influence on persuading
hosts to update their PHP versions, because PHP has gotten better enough that the argument is easy:
upgrade to PHP 7 and your data center costs will drop X%. It’s not too hard to imagine winning
similarly on the security argument.
WASM code on memory-constrained, oversubscribed, CPU-poor hosts is still considerably better for
certain kinds of computation than user-space PHP code on memory-constrained, oversubscribed,
CPU-poor hosts.
>
>
> >>
>>
>>>
>>>
>>>
>>> However, plugins basically work via hooks/filters. So as long as you register the right
>>> listeners and handle serialization properly, you can simply run a separate process for the plugin,
>>> or call a socket for “remote” plugins.
>>>
>>>
>>>
>>> I don’t see anything stopping anyone from implementing that today.
>>>
>>>
>>> — Rob
>>>
>>
>> I’m excited to see this conversation. I’ve wanted to propose it a number of times
>> myself.
>>
>>
>> Warmly,
>> Dennis Snell
>>
>
> I actually love wasm, I'm currently in the process of compiling my mini php runtime to
> wasm (basically a browser only version of 3v4l). I'm not against this for any personal reasons,
> I'm simply not sure it's the right approach.
>
>
>
That sounds awesome. The WordPress Playground ships a copy of PHP compiled to WASM, and it’s been
an incredible journey realizing just what’s capable with this technology. It’s really boosted
the developer experience working on WordPress itself and also that of those building their own
projects using WordPress. Some are already bringing in libraries like ffmpeg to convert images and
media on the frontend, though it’s sad that can’t also be done on the server yet.
>
> Cheers,
> Hammed
>
>
>
Hope you have a nice weekend. Cheers.