Indeed. Looking closer at PyLong's constituent datatypes it does use some ambiguous unsigned long / long types in there:
https://github.com/python/cpython/blob/ ... r.h#L44-56 . But apparently that #elif clause is limited to PYLONG_BITS_IN_DIGIT == 15, which I've confirmed only gets set that way in 32-bit builds.
The unnecessary uses of long that raised my suspicions at first are the ones like so: https://github.com/python/cpython/searc ... g_fromlong
It'll even do return PyLong_FromLong(-1); because there is no PyLong_FromInt() function. I don't think it can inline these calls unless link-time code generation is now a thing on Linux.
But this would be a few extra registers in a limited number of places, and cannot explain the double-digit performance losses.
Having seen the codebase now, I'm realizing that every tiny object is allocated on the heap as a PyObject*. It makes sense that doubling the pointer width will result in more overhead in programs that on the surface appeared to be compute-bound. In fact it probably affects these programs even more so if they're doing many little char- or integer math operations.