I've come across an issue I can't seem to diagnose, where a Python function doesn't return until a long time after it actually completes. A minimum example (edited to be simpler):
from sklearn.neighbors import KDTree
import time
def mwe(n):
start = time.time()
a = KDTree([[1]]).query_radius([[1]]*int(n), 1)
print(f'Function completed in: {time.time()-start:.2f} seconds')
start = time.time()
mwe(1e6)
print(f'Function returned after {time.time()-start:.1f} seconds')
start = time.time()
mwe(1e7)
print(f'Function returned after {time.time()-start:.1f} seconds')
Output:
Function completed in: 0.36 seconds
Function returned after 4.4 seconds
Function completed in: 3.75 seconds
Function returned after 44.0 seconds
At first I thought this could be a garbage collection issue, but adding gc.disable()
doesn't have any effect. As well, if it were garbage collection, I would expect it to appear prominently in the profile output—but cProfile doesn't even recognize the problem exists:
645 function calls (636 primitive calls) in 2.701 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
2 1.069 0.534 1.076 0.538 {method 'query_radius' of 'sklearn.neighbors._ball_tree.BinaryTree' objects}
32/23 0.919 0.029 1.059 0.046 {built-in method numpy.core._multiarray_umath.implement_array_function}
1 0.570 0.570 2.701 2.701 test_return_delay.py:6(test_loop)
4 0.135 0.034 0.135 0.034 {method 'repeat' of 'numpy.ndarray' objects}
7 0.004 0.001 0.004 0.001 {method 'reduce' of 'numpy.ufunc' objects}
5 0.002 0.000 0.002 0.000 {built-in method numpy.arange}
2 0.000 0.000 0.000 0.000 {built-in method builtins.print}
2 0.000 0.000 0.246 0.123 index_tricks.py:322(__getitem__)
4 0.000 0.000 0.005 0.001 validation.py:96(_assert_all_finite)
3 0.000 0.000 0.000 0.000 function_base.py:23(linspace)
4 0.000 0.000 0.005 0.001 validation.py:629(check_array)
4 0.000 0.000 0.000 0.000 _array_api.py:168(_asarray_with_order)
4 0.000 0.000 0.000 0.000 numerictypes.py:573(_can_coerce_all)
4 0.000 0.000 0.000 0.000 validation.py:320(_num_samples)
47 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
4 0.000 0.000 0.135 0.034 <__array_function__ internals>:177(repeat)
...
If I instead run cProfile as a module over the entire script (i.e., python -m cProfile -s time test_return_delay.py
), it does at least show there's an issue, but it doesn't seem to give any indication where it's coming from as far as I can tell:
788549 function calls (774939 primitive calls) in 12.372 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 11.257 11.257 11.257 11.257 {method 'enable' of '_lsprof.Profiler' objects}
808 0.173 0.000 0.173 0.000 {built-in method io.open_code}
3514 0.124 0.000 0.124 0.000 {built-in method nt.stat}
157 0.094 0.001 0.097 0.001 {built-in method _imp.create_dynamic}
808 0.076 0.000 0.076 0.000 {built-in method marshal.loads}
808 0.045 0.000 0.045 0.000 {method 'read' of '_io.BufferedReader' objects}
6273 0.033 0.000 0.051 0.000 <frozen importlib._bootstrap_external>:96(_path_join)
1066/1 0.025 0.000 12.373 12.373 {built-in method builtins.exec}
391 0.025 0.000 0.067 0.000 doccer.py:13(docformat)
1701/1692 0.020 0.000 0.086 0.000 {built-in method builtins.__build_class__}
157/126 0.016 0.000 0.084 0.001 {built-in method _imp.exec_dynamic}
399 0.011 0.000 0.019 0.000 doccer.py:180(indentcount_lines)
1359 0.011 0.000 0.161 0.000 <frozen importlib._bootstrap_external>:1536(find_spec)
9043 0.009 0.000 0.009 0.000 {method 'splitlines' of 'str' objects}
499/100 0.009 0.000 0.025 0.000 sre_parse.py:494(_parse)
809 0.009 0.000 0.009 0.000 {method '__exit__' of '_io._IOBase' objects}
17871 0.009 0.000 0.009 0.000 {built-in method builtins.getattr}
808 0.008 0.000 0.367 0.000 <frozen importlib._bootstrap_external>:950(get_code)
100236 0.008 0.000 0.008 0.000 {method 'append' of 'list' objects}
1013/842 0.007 0.000 0.337 0.000 <frozen importlib._bootstrap>:921(_find_spec)
1040/2 0.007 0.000 1.116 0.558 <frozen importlib._bootstrap>:1022(_find_and_load)
9075 0.007 0.000 0.007 0.000 {method 'expandtabs' of 'str' objects}
671 0.007 0.000 0.020 0.000 inspect.py:2280(_signature_from_function)
1 0.006 0.006 0.006 0.006 memory.py:371(MemorizedFunc)
...
Where might the issue lie, or how can I further debug?
I've simplified the example and raised the issue on scikit-learn's repository. The fix seems to be to have KDTree (and the other nearest neighbor classes) generate nested lists like scipy.spatial.KDTree
does, rather than the current method of using arrays as the nested object.