Python Performance: A Recommendation
Python performance and multi-threading have a bad reputation (outside of the Python community). We show when these are indeed issues (not always), search for solutions, and give some recommendation on how the community could focus on solutions. And you can contribute to the solutions!
Python is a dynamic language. The main implementation CPython, is interpreted (actually, it is compiled into code interpreted by a Python virtual machine). The Python interpreter uses a Global Interpreter Lock (GIL) to protect its structures from concurrent access.
Outside the Python community, you hear often that Python is slow, and the GIL sucks. Inside the community, the view is more positive, and indeed there is a wealth of approaches for High Performance Python. There are different approaches for I/O-bound problems and CPU-bound problems:
- I/O bound problems can make good use of multi-threading (where the GIL is released during I/O) or asynchronous programming.
- CPU-bound problems can be addressed by better algorithms (nothing beats an algorithm with less computational complexity), using array-based programming (NumPy), using various problem-specific packages written in a compiled language, or using Cython, a mix of C and Python.
- Application-level caches are also helpful, because no computation is always faster than the fastest possible computation.
Some approaches are quite complex, and unless confined to a small hotspot, the advantage of Python over less dynamic languages might get lost. That’s why one author calls it Python’s Hardest Problem.
The GIL constraint is removed when multiple processes are used, each with its own Python interpreter and GIL. This works nicely for problems that don’t require massive interaction between data or even massive amounts of read-only data.
In my own work with Quantax, the Swisscom Market Risk System, which is written in Python, we always face demand for increased speed. Using a lot of NumPy and many levels of application caches, we achieve about 25000 valuations of financial instruments per second on one core of a laptop CPU.
However, the price for this is complexity of cache invalidations, and complicated code to map the problem to NumPy.
We use processes at a relative coarse-grained level, as worker processes to calculate reports. The main issue of processes is the massive amount of common data the financial calculations require, leading to rather large memory consumption per process. However, there is rarely more than one logical process that modifies the objects (by changing transactions or rates).
If Python wants to be a viable programming environment for medium size applications with those characteristics, we need two things:
- Speed achievable by (just in time) compilation
- An efficient way to use dozens of cores while sharing memory.
With a large community, solutions are bound to come up:
PyPy shows impressive results with its JIT compiler (PyPy’s Architecture; Overview article). The Python programming ecosystem is large, with many packages coded in other languages. PyPy is slow in providing the whole ecosystem, and is still working on NumPy support.
PyPy also made a breakthrough progress replacing the GIL by Software Transactional Memory (STM), and getting that up to speed. This allows multi-threaded Python programs to use all cores, with acceptable overhead. It also provides an alternative to locking that may make multi-threaded programming less error-prone.
These are implementations that target a particular language runtime, Java VM for Jython and .NET for IronPython. They run without the GIL, because they map Python structures to the thread-safe structures provided by their platforms. They do not provide JIT compilation. Also, they do not uniformly provide all libraries of the CPython ecosystem.
PyParallel (long talk, slides, summary) is not a solution yet, but an experiment designed to circumvent the GIL in the standard CPython interpreter, under the very specific condition that threads running without the GIL do not modify any Python objects except for those created by that thread.
Other compilers: Pyston, Nuitka
- Nuitka is a full compiler, which uses the Python runtime to execute code. While admirably compatible with CPython, the achievable speedups seem to be limited to elimination of the parsing overhead and some limited static analysis.
- Pyston is a project to build a method-at-a-time JIT using LLVM as its code generator.
I admire these projects for their courage and stamina. Many of them are starving for resources (including money). I believe that PyPy is the most promising, most mature, and most complete of these efforts.
Let’s see what the community can do to foster it.
Python is mature, and has a large following, with a wide range of usage (ranging from short-lived glue scripts, ad-hoc analysis, Rasperry Pi applications to full blown applications such as Quantax, stacks (OpenStack) and major services such as Youtube and Dropbox).
The community is nice and helpful. There are strong opinions about what is “pythonic” (follows the Python way of doing things). This helps to keep Python conceptually simple (low conceptual overhead). There is central control over language features (by Guido van Rossum, the creator of Python) and over the reference implementation, CPython.
Switching to a new language infrastructure such as PyPy is even slower, for several reasons:
- For many purposes, CPython is fast enough.
- Many usage scenarios (especially Python as a glue language) use scripts with a short execution time, for which the warmup of a JIT can be prohibitive.
- Many libraries of CPython call C code; they’re either not available in PyPy or calling them is slow.
- CPython is available on exotic platforms or not so exotic platforms (64-bit Windows) that are not currently supported by PyPy.
All these are reasons why PyPy hasn’t replaced CPython, and will not do it in the near future. Therefore, I recommend a two-pronged approach, with focus on PyPy as the most promising new technology.
For Python to retain its current mainstream acceptance, protect its application code base, and defend its positions against newcomers such as Julia and conceptually complex languages such as Scala, I personally believe efforts should be focused on:
- CPython as the all-purpose interpreter, compatible with all legacy code and running short scripts efficiently.
- It’s is questionable why CPython needs to provide ongoing Python 3.x support – these folks have little incentive to upgrade anyway!
- PyParallel should become part of CPython, to provide immediate relief for those who need it and can live with its constraints.
- PyPy as the high-performance, STM-enabled reference implementation, where performance and modern techniques make a difference.
- Platform-centric implementations will remain a niche (or dead end when resources run out), or should be re-targeted to run on top of PyPy’s infrastructure.
If you are a Python user, you are part of the community, and might want to get involved. This is very welcome! As many other open source projects, PyPy is looking for financial contributions and for volunteer work:
- You may donate to general development, or to a specific topic such as STM.
- You may also become a developer – the best occasion for Swiss people (and ski fans) will be the upcoming Leysin Winter Sprint (20-28th February 2015).
- Windows system cracks may also be interested in helping to finally port PyPy to Windows 64.