2

I am working with scikitlearn and would like to override the build method of the treebuilder class for regression trees implemented using cython. In order to do so I figured I would need access to the cython code, so I have added scikitlearn as a git submodule.

My project structure thus looks as follows:

.
|-- setup.py
|-- MyNewTree
|   |-- __init__.py
|   |-- MyNewTree.pyx
|   `-- scikitlearn
|       `-- sklearn
|           `-- tree
|               |-- __init__.py
|               |-- _tree.pxd
|               |-- _tree.pyx
|               |-- setup.py
|               `-- tree.py

In my setup.py I am doing the following:

from setuptools import setup, find_packages
from setuptools.extension import Extension
from Cython.Build import cythonize
import numpy

extensions = [
    Extension(
        "newtree.MyNewTree",
        ["newtree/MyNewTree.pyx"],
        include_dirs=['modulenetwork/scikitlearn/sklearn/tree', numpy.get_include()]
    )
]

setup(
    name = 'MyNewTree',
    version = '0.0.1',
    packages = find_packages(),
    ext_modules = cythonize(extensions)
)

Finally MyNewTree.pyx

# cython: cdivision=True
# cython: boundscheck=False
# cython: wraparound=False

import numpy as np
cimport numpy as np
np.import_array()

from .scikitlearn.sklearn.tree._tree cimport BestFirstTreeBuilder

cdef class TreeBuilder(BestFirstTreeBuilder):
    cpdef build(self):
        print('This is an overridden build method!')

What I would like this to produce is a TreeBuilder class that has a different build method from the original scikitlearn implementation, but has everything else the same.

To compile I run python setup.py build_ext --inplace

However I get the following error:

Error compiling Cython file:
------------------------------------------------------------
...

import numpy as np
cimport numpy as np
np.import_array()

from .scikitlearn.sklearn.tree._tree cimport BestFirstTreeBuilder
^
------------------------------------------------------------

newtree/MyNewTree.pyx:9:0: 'newtree/scikitlearn/sklearn/tree/_tree.pxd' not found

Error compiling Cython file:
------------------------------------------------------------
...

import numpy as np
cimport numpy as np
np.import_array()

from .scikitlearn.sklearn.tree._tree cimport BestFirstTreeBuilder
^
------------------------------------------------------------

newtree/MyNewTree.pyx:9:0: 'newtree/scikitlearn/sklearn/tree/_tree/BestFirstTreeBuilder.pxd' not found

Error compiling Cython file:
------------------------------------------------------------
...
cimport numpy as np
np.import_array()

from .scikitlearn.sklearn.tree._tree cimport BestFirstTreeBuilder

cdef class TreeBuilder(BestFirstTreeBuilder):
    ^
------------------------------------------------------------

newtree/MyNewTree.pyx:11:5: 'BestFirstTreeBuilder' is not a type name
Traceback (most recent call last):
  File "setup.py", line 18, in <module>
    ext_modules = cythonize(extensions)
  File "/Users/__/miniconda3/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 1039, in cythonize
    cythonize_one(*args)
  File "/Users/__/miniconda3/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 1161, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: newtree/MyNewTree.pyx

Clearly the files that are reported as not existing actually do exist. Is this a problem with my setup script? How do I properly cimport scikitlearn classes into my code?

holmrenser
  • 425
  • 3
  • 11
  • 1
    I believe you need to expand the `["newtree/MyNewTree.pyx"]` list to include the sklearn pyx, AFAIK, `include_dirs` is only for c headers. – chrisb Dec 11 '17 at 21:58

0 Answers0