Showing posts with label cpyext. Show all posts

Tuesday, March 17, 2020

Leysin 2020 Sprint Report

At the end of February ten of us gathered in Leysin, Switzerland to work on
a variety of topics including HPy, PyPy Python 3.7 support and the PyPy
migration to Heptapod.

We had a fun and productive week. The snow was beautiful. There was skiing
and lunch at the top of Berneuse, cooking together, some late nights at
the pub next door, some even later nights coding, and of course the
obligatory cheese fondue outing.

There were a few of us participating in a PyPy sprint for the first time
and a few familiar faces who had attended many sprints. Many different
projects were represented including PyPy, HPy, GraalPython,
Heptapod, and rust-cpython. The atmosphere was relaxed and welcoming, so if
you're thinking of attending the next one -- please do!

Topics worked on:

HPy

HPy is a new project to design and implement a better API for extending
Python in C. If you're unfamiliar with it you can read more about it at
HPy.

A lot of attention was devoted to the Big HPy Design Discussion which
took up two full mornings. So much was decided that this will likely
get its own detailed write-up, but bigger topics included:

the HPy GetAttr, SetAttr, GetItem and SetItem methods,
HPy_FromVoidP and HPy_AsVoidP for passing HPy handles to C functions
that pass void* pointers to callbacks,
avoiding having va_args as part of the ABI,
exception handling,
support for creating custom types.

Quite a few things got worked on too:

implemented support for writing methods that take keyword arguments with
HPy_METH_KEYWORDS,
implemented HPy_GetAttr, HPy_SetAttr, HPy_GetItem, and HPy_SetItem,
started implementing support for adding custom types,
started implementing dumping JSON objects in ultrajson-hpy,
refactored the PyPy GIL to improve the interaction between HPy and
PyPy's cpyext,
experimented with adding HPy support to rust-cpython.

And there was some discussion of the next steps of the HPy initiative
including writing documentation, setting up websites and funding, and
possibly organising another HPy gathering later in the year.

PyPy

Georges gave a presentation on the Heptapod topic and branch workflows
and showed everyone how to use hg-evolve.
Work was done on improving the PyPy CI buildbot post the move to
heptapod, including a light-weight pre-merge CI and restricting
when the full CI is run to only branch commits.
A lot of work was done improving the -D tests.

Miscellaneous

Armin demoed VRSketch and NaN Industries in VR, including an implementation
of the Game of Life within NaN Industries!
Skiing!

Aftermath

Immediately after the sprint large parts of Europe and the world were
hit by the COVID-19 epidemic. It was good to spend time together before
travelling ceased to be a sensible idea and many gatherings were cancelled.

Keep safe out there everyone.

The HPy & PyPy Team & Friends

In joke for those who attended the sprint: Please don't replace this blog post
with its Swedish translation (or indeed a translation to any other language :).

Posted by hodgestar at 22:57 2 Comments

Wednesday, October 18, 2017

(Cape of) Good Hope for PyPy

Hello from the other side of the world (for most of you)!

With the excuse of coming to PyCon ZA during the last two weeks Armin, Ronan, Antonio and sometimes Maciek had a very nice and productive sprint in Cape Town, as pictures show :). We would like to say a big thank you to Kiwi.com, which sponsored part of the travel costs via its awesome Sourcelift program to help Open Source projects.

Armin, Anto and Ronan at Cape Point

Armin, Ronan and Anto spent most of the time hacking at cpyext, our CPython C-API compatibility layer: during the last years, the focus was to make it working and compatible with CPython, in order to run existing libraries such as numpy and pandas. However, we never paid too much attention to performance, so the net result is that with the latest released version of PyPy, C extensions generally work but their speed ranges from "slow" to "horribly slow".

For example, these very simple microbenchmarks measure the speed of calling (empty) C functions, i.e. the time you spend to "cross the border" between RPython and C. (Note: this includes the time spent doing the loop in regular Python code.) These are the results on CPython, on PyPy 5.8, and on our newest in-progress version:

$ python bench.py     # CPython
noargs      : 0.41 secs
onearg(None): 0.44 secs
onearg(i)   : 0.44 secs
varargs     : 0.58 secs

$ pypy-5.8 bench.py   # PyPy 5.8
noargs      : 1.01 secs
onearg(None): 1.31 secs
onearg(i)   : 2.57 secs
varargs     : 2.79 secs

$ pypy bench.py       # cpyext-refactor-methodobject branch
noargs      : 0.17 secs
onearg(None): 0.21 secs
onearg(i)   : 0.22 secs
varargs     : 0.47 secs

So yes: before the sprint, we were ~2-6x slower than CPython. Now, we are faster than it! To reach this result, we did various improvements, such as:

teach the JIT how to look (a bit) inside the cpyext module;

write specialized code for calling METH_NOARGS, METH_O and METH_VARARGS functions; previously, we always used a very general and slow logic;

implement freelists to allocate the cpyext versions of int and tuple objects, as CPython does;

the cpyext-avoid-roundtrip branch: crossing the RPython/C border is slowish, but the real problem was (and still is for many cases) we often cross it many times for no good reason. So, depending on the actual API call, you might end up in the C land, which calls back into the RPython land, which goes to C, etc. etc. (ad libitum).

The branch tries to fix such nonsense: so far, we fixed only some cases, which are enough to speed up the benchmarks shown above. But most importantly, we now have a clear path and an actual plan to improve cpyext more and more. Ideally, we would like to reach a point in which cpyext-intensive programs run at worst at the same speed of CPython.

The other big topic of the sprint was Armin and Maciej doing a lot of work on the unicode-utf8 branch: the goal of the branch is to always use UTF-8 as the internal representation of unicode strings. The advantages are various:

decoding a UTF-8 stream is super fast, as you just need to check that the stream is valid;

encoding to UTF-8 is almost a no-op;

UTF-8 is always more compact representation than the currently used UCS-4. It's also almost always more compact than CPython 3.5 latin1/UCS2/UCS4 combo;

smaller representation means everything becomes quite a bit faster due to lower cache pressure.

Before you ask: yes, this branch contains special logic to ensure that random access of single unicode chars is still O(1), as it is on both CPython and the current PyPy.
We also plan to improve the speed of decoding even more by using modern processor features, like SSE and AVX. Preliminary results show that decoding can be done 100x faster than the current setup.

In summary, this was a long and profitable sprint, in which we achieved lots of interesting results. However, what we liked even more was the privilege of doing commits from awesome places such as the top of Table Mountain:

Our sprint venue today #pypy pic.twitter.com/o38IfTYmAV
— Ronan Lamy (@ronanlamy) 4 ottobre 2017

The panorama we looked at instead of staring at cpyext code

(Cape of) Good Hope for PyPy

Armin, Anto and Ronan at Cape Point

$ python bench.py     # CPython
noargs      : 0.41 secs
onearg(None): 0.44 secs
onearg(i)   : 0.44 secs
varargs     : 0.58 secs

$ pypy-5.8 bench.py   # PyPy 5.8
noargs      : 1.01 secs
onearg(None): 1.31 secs
onearg(i)   : 2.57 secs
varargs     : 2.79 secs

$ pypy bench.py       # cpyext-refactor-methodobject branch
noargs      : 0.17 secs
onearg(None): 0.21 secs
onearg(i)   : 0.22 secs
varargs     : 0.47 secs

So yes: before the sprint, we were ~2-6x slower than CPython. Now, we are faster than it! To reach this result, we did various improvements, such as:

teach the JIT how to look (a bit) inside the cpyext module;

write specialized code for calling METH_NOARGS, METH_O and METH_VARARGS functions; previously, we always used a very general and slow logic;

implement freelists to allocate the cpyext versions of int and tuple objects, as CPython does;

the cpyext-avoid-roundtrip branch: crossing the RPython/C border is slowish, but the real problem was (and still is for many cases) we often cross it many times for no good reason. So, depending on the actual API call, you might end up in the C land, which calls back into the RPython land, which goes to C, etc. etc. (ad libitum).

decoding a UTF-8 stream is super fast, as you just need to check that the stream is valid;

encoding to UTF-8 is almost a no-op;

UTF-8 is always more compact representation than the currently used UCS-4. It's also almost always more compact than CPython 3.5 latin1/UCS2/UCS4 combo;

smaller representation means everything becomes quite a bit faster due to lower cache pressure.

Our sprint venue today #pypy pic.twitter.com/o38IfTYmAV
— Ronan Lamy (@ronanlamy) 4 ottobre 2017

The panorama we looked at instead of staring at cpyext code

Posted by Antonio Cuni at 14:31 6 Comments

Friday, April 9, 2010

Using CPython extension modules with PyPy natively, or: PyPy can load .pyd files with CPyExt!

PyPy is now able to load and run CPython extension modules (i.e. .pyd and .so files) natively by using the new CPyExt subsystem. Unlike the solution presented in another blog post (where extension modules like numpy etc. were run on CPython and proxied through TCP), this solution does not require a running CPython anymore. We do not achieve full binary compatiblity yet (like Ironclad), but recompiling the extension is generally enough.

The only prerequisite is that the necessary functions of the C API of CPython are already implemented in PyPy. If you are a user or an author of a module and miss certain functions in PyPy, we invite you to implement them. Up until now, a lot of people (including a lot of new committers) have stepped up and implemented a few functions to get their favorite module running. See the end of this post for a list of names.

Regarding speed, we tried the following: even though there is a bit of overhead when running these modules, we could run the regular expression engine of CPython (_sre.so) and execute the spambayes benchmark of the Unladen Swallow benchmark suite (cf. speed.pypy.org) and experience a speedup: It became two times faster on pypy-c than with the built-in regular expression engine of PyPy. From Amdahl's Law it follows that the _sre.so must run several times faster than the built-in engine.

Currently pursued modules include PIL and others. Distutils support is nearly ready. If you would like to participate or want information on how to use this new feature, come and join our IRC channel #pypy on freenode.

Amaury Forgeot d'Arc and Alexander Schremmer

Further CPyExt Contributors:

Alex Gaynor
Benjamin Peterson
Jean-Paul Calderone
Maciej Fijalkowski
Jan de Mooij
Lucian Branescu Mihaila
Andreas Stührk
Zooko Wilcox-O Hearn

Posted by Alexander Schremmer at 23:56 18 Comments