10

To learn how to create C-extensions I've decided to just copy a built-in .c-file (in this case itertoolsmodule.c) and placed it in my package. I only changed the names inside the module from itertools to mypkg.

Then I compiled it (Windows 10, MSVC Community 14) as setuptools.Extension:

from setuptools import setup, Extension

itertools_module = Extension('mypkg.itertoolscopy',
                              sources=['src/itertoolsmodulecopy.c'])

setup(...
      ext_modules=[itertools_module])

The default uses the compiler flags /c /nologo /Ox /W3 /GL /DNDEBUG /MD and I read somewhere that these defaults equals the settings of how the python was compiled. However I use conda (64bit setup) so this might not necessarily be true.

It all went well - but a benchmark for filterfalse showed that it's almost a factor 2 slower than the built-in:

import mypkg
import itertools

import random

a = [random.random() for _ in range(500000)]
func = None

%timeit list(filter(func, a))
100 loops, best of 3: 3.42 ms per loop
%timeit list(itertools.filterfalse(func, a))
100 loops, best of 3: 3.41 ms per loop
%timeit list(mypkg.filterfalse(func, a))
100 loops, best of 3: 6.77 ms per loop

However, for smaller iterables the discrepancy also becomes smaller:

a = [random.random() for _ in range(500)]  # 1 / 1000 of the elements

%timeit list(filter(func, a))
100000 loops, best of 3: 9.66 µs per loop
%timeit list(itertools.filterfalse(func, a))
100000 loops, best of 3: 10.8 µs per loop
%timeit list(mypkg.filterfalse(func, a))
100000 loops, best of 3: 14.4 µs per loop

I wasn't able to explain this difference in speed but I have to admit that I'm not too familiar with compiling C-code. I'm at a loss what actually makes it slower.

The results are the same on python 2.7 with ifilter and ifilterfalse and the 2.7 version of the itertoolsmodule.c file.

Does anyone knows what makes the code perform worse than the built-ins and how one could speed it up?

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • Since I'm trying to reproduce your results, what version of python are you targeting and on what platform (x86 or x86_64)? – anthony sottile Dec 17 '16 at 02:21
  • The timings were done on 64bit py35 and 64bit py27 (both conda). – MSeifert Dec 17 '16 at 02:28
  • In python2.7 did you do `list(itertools.ifilterfalse(...))`, there is no `itertools.filterfalse` in python2, and `ifilterfalse` returns an iterator – anthony sottile Dec 17 '16 at 02:29
  • @AnthonySottile Yes, as stated in the question "_The results are the same on python 2.7 with ifilter and ifilterfalse and the 2.7 version of the itertoolsmodule.c file._". – MSeifert Dec 17 '16 at 02:32
  • @MSeifert Did you happen to find out anymore about this? I compared the built-in `min` function with a function from an extension module that uses identical code. `731 µs` for the built-in `min` and `1.07 ms` for the extension module `min` (for some input iterable). This is quite concerning for me. – OTheDev May 06 '22 at 09:23
  • @MSeifert I found a fix. Rebuilding Python (via `make`) as in [here](https://docs.python.org/3/extending/extending.html#compilation-and-linkage) led to identical (even slightly better) timings for the extension module. Very pleased. My config options were `--enable-optimizations --with-lto`. I wonder if `python setup.py install` uses it's own type of options. – OTheDev May 06 '22 at 10:30

1 Answers1

6

Curious about this problem myself I set out to attempt to reproduce the findings. Though the OP is on windows, it was slightly easier for me to attempt this on linux. I did eventually try it on windows but I'll walk you through what I did!

setup

I made a little test harness, it's a shell script but it makes it easier for someone else to try what I'm trying :D

test.sh

#!/usr/bin/env bash
set -euxo pipefail
rm -rf itertoolsmodule.c setup.py venv

PYTHON=3.5
FUNCTION=filterfalse
INIT=PyInit_
#PYTHON=2.7
#FUNCTION=ifilterfalse
#INIT=init

wget "https://raw.githubusercontent.com/python/cpython/$PYTHON/Modules/itertoolsmodule.c"
sed -i "s/${INIT}itertools/${INIT}_myitertools/" itertoolsmodule.c
sed -i 's/"itertools"/"_myitertools"/' itertoolsmodule.c

cat > setup.py << EOF
from setuptools import setup, Extension
mod = Extension('_myitertools', ['itertoolsmodule.c'])
setup(name='foo', ext_modules=[mod])
EOF

virtualenv venv -ppython"$PYTHON"
venv/bin/pip install . -v

cat > test.py << EOF
import _myitertools
import itertools
import random
import time


a = [random.random() for _ in range(500000)]
iterations = range(10)
seconds = 5


def builtins_filter():
    for _ in iterations:
        list(filter(None, a))

_itertools_filterfalse = itertools.$FUNCTION
def itertools_filterfalse():
    for _ in iterations:
        list(_itertools_filterfalse(None, a))

_myitertools_filterfalse = _myitertools.$FUNCTION
def myitertools_filterfalse():
    for _ in iterations:
        list(_myitertools_filterfalse(None, a))


def runbench(func):
    start = time.time()
    end = start + seconds
    iterations = 0
    while time.time() < end:
        func()
        iterations += 1
    return iterations


for func in (builtins_filter, itertools_filterfalse, myitertools_filterfalse):
    print('*' * 79)
    print(func.__name__)
    print('{} iterations in {} seconds'.format(runbench(func), seconds))
EOF

ubuntu16.04 x86_64 python3.5.2 (stock, apt)

(I cut out the (imo) unimportant parts):

$ ./test.sh
+ rm -rf itertoolsmodule.c setup.py venv
+ PYTHON=3.5
+ FUNCTION=filterfalse
+ INIT=PyInit_

...

+ venv/bin/pip install . -v

...

    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/tmp/foo/venv/include/python3.5m -c itertoolsmodule.c -o build/temp.linux-x86_64-3.5/itertoolsmodule.o
    creating build/lib.linux-x86_64-3.5
    x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.5/itertoolsmodule.o -o build/lib.linux-x86_64-3.5/_myitertools.cpython-35m-x86_64-linux-gnu.so

...

+ venv/bin/python test.py
*******************************************************************************
builtins_filter
1401 iterations in 50 seconds
*******************************************************************************
itertools_filterfalse
1977 iterations in 50 seconds
*******************************************************************************
myitertools_filterfalse
1981 iterations in 50 seconds

ubuntu16.04 x86_64 python2.7.12 (stock, apt)

+ rm -rf itertoolsmodule.c setup.py venv
+ PYTHON=2.7
+ FUNCTION=ifilterfalse
+ INIT=init

...

+ venv/bin/pip install . -v

...

    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c itertoolsmodule.c -o build/temp.linux-x86_64-2.7/itertoolsmodule.o
    creating build/lib.linux-x86_64-2.7
    x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wl,-Bsymbolic-functions -Wl,-z,relro -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/itertoolsmodule.o -o build/lib.linux-x86_64-2.7/_myitertools.so

...

+ venv/bin/python test.py
*******************************************************************************
builtins_filter
871 iterations in 50 seconds
*******************************************************************************
itertools_filterfalse
1918 iterations in 50 seconds
*******************************************************************************
myitertools_filterfalse
1863 iterations in 50 seconds

Windows!

For windows, I changed the script slightly so it built virtualenvs using C:\Python##\python.exe (Using mysysgit so I have some amount of a unix toolset (bash, etc.)). Changing things from bin to Scripts (for virtualenv), etc. I don't have/use conda so these'll just be stock python on windows 10

windows 10 python 2.7.9 (stock, msi installer)

+ rm -rf itertoolsmodule.c setup.py venv
+ PYTHON=2.7
+ FUNCTION=ifilterfalse
+ INIT=init

...

+ venv/Scripts/pip install . -v

...

    C:\Users\Anthony\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -IC:\Python27\include -Ic:\users\anthony\appdata\local\temp\foo\venv\PC /Tcitertoolsmodule.c /Fobuild\temp.win32-2.7\Release\itertoolsmodule.obj
itertoolsmodule.c
    creating build\lib.win32-2.7
    C:\Users\Anthony\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:C:\Python27\Libs /LIBPATH:c:\users\anthony\appdata\local\temp\foo\venv\libs /LIBPATH:c:\users\anthony\appdata\local\temp\foo\venv\PCbuild /EXPORT:init_myitertools build\temp.win32-2.7\Release\itertoolsmodule.obj /OUT:build\lib.win32-2.7\_myitertools.pyd /IMPLIB:build\temp.win32-2.7\Release\_myitertools.lib /MANIFESTFILE:build\temp.win32-2.7\Release\_myitertools.pyd.manifest

...

+ venv/Scripts/python test.py
*******************************************************************************
builtins_filter
914 iterations in 50 seconds
*******************************************************************************
itertools_filterfalse
2352 iterations in 50 seconds
*******************************************************************************
myitertools_filterfalse
2266 iterations in 50 seconds

windows 10 python3.5.1 (stock, msi installer)

+ rm -rf itertoolsmodule.c setup.py venv
+ PYTHON=3.5
+ FUNCTION=filterfalse
+ INIT=PyInit_

...

+ venv/Scripts/pip install . -v

...

    D:\Programs\VS2015\VC\BIN\amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Python35\include -IC:\Python35\include -ID:\Programs\VS2015\VC\INCLUDE -ID:\Programs\VS2015\VC\ATLMFC\INCLUDE "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\winrt" /Tcitertoolsmodule.c /Fobuild\temp.win-amd64-3.5\Release\itertoolsmodule.obj
itertoolsmodule.c
    creating C:\Temp\pip-1fnf27jo-build\build\lib.win-amd64-3.5
    D:\Programs\VS2015\VC\BIN\amd64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Python35\Libs /LIBPATH:c:\users\anthony\appdata\local\temp\foo\venv\libs /LIBPATH:c:\users\anthony\appdata\local\temp\foo\venv\PCbuild\amd64 /LIBPATH:D:\Programs\VS2015\VC\LIB\amd64 /LIBPATH:D:\Programs\VS2015\VC\ATLMFC\LIB\amd64 "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.10240.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\8.1\lib\winv6.3\um\x64" /EXPORT:PyInit__myitertools build\temp.win-amd64-3.5\Release\itertoolsmodule.obj /OUT:build\lib.win-amd64-3.5\_myitertools.cp35-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.5\Release\_myitertools.cp35-win_amd64.lib

...

+ venv/Scripts/python test.py
*******************************************************************************
builtins_filter
658 iterations in 50 seconds
*******************************************************************************
itertools_filterfalse
2601 iterations in 50 seconds
*******************************************************************************
myitertools_filterfalse
2715 iterations in 50 seconds

Conclusion

At the very least, my tests with stock python show that the extension module does not exhibit different performance characteristics.

wellp, I spent a half hour on this and didn't produce a reproduction. Hopefully this is helpful for the next poor soul who attempts this. I can only guess that conda is doing some additional optimization and then shipping a pyconfig.h file which lies about the flags used to compile. Though to be honest, I haven't yet ventured into the conda space so I don't know how their ecosystem works

anthony sottile
  • 61,815
  • 15
  • 148
  • 207
  • One slight comment that might help others reproducing my findings. My versions of python on windows were too old to handle the source on the 2.7 / 3.5 branches and I had to choose versions of the source before `Py_SETREF` were introduced. – anthony sottile Dec 17 '16 at 03:22
  • Thank you for taking the time to dig through this. If the issue is not reproducible that would be very good news indeed! My [appveyor tests](https://ci.appveyor.com/project/MSeifert04/testpkg/build/1.0.21) seem to indicate that you're right. I also created an 32bit conda environment locally and I can't see any timeit-differences there neither. However it still buggers me why it's slower on my 64bit conda environment. – MSeifert Dec 17 '16 at 05:27