14

I discovered that numpy.sin behaves differently when the argument size is <= 8192 and when it is > 8192. The difference is in both performance and values returned. Can someone explain this effect?

For example, let's calculate sin(pi/4):

x = np.pi*0.25
for n in range(8191, 8195):
    xx = np.repeat(x, n)
    %timeit np.sin(xx)
    print(n, np.sin(xx)[0])
64.7 µs ± 194 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
8191 0.7071067811865476
64.6 µs ± 166 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
8192 0.7071067811865476
20.1 µs ± 189 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
8193 0.7071067811865475
21.8 µs ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
8194 0.7071067811865475

After crossing the 8192 elements limit the calculations become more than 3 times faster and give a different result: the last digit becomes 5 instead of 6.

When I tried to calculate the same value in other ways I obtained:

  • C++ std::sin (Visual Studio 2017, Win32 platform) gives 0.7071067811865475;
  • C++ std::sin (Visual Studio 2017, x64 platform) gives 0.70710678118654756;
  • math.sin gives 0.7071067811865476, which is logical because I used 64-bit Python.

I couldn't find any explanation in the NumPy documentation, nor in its code.

Update #2: It is hard to believe, but replacing sin by sqrt gives this:

44.2 µs ± 751 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
8191 0.8862269254527579
44.1 µs ± 543 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
8192 0.8862269254527579
10.3 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
8193 0.886226925452758
10.4 µs ± 4.41 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
8194 0.886226925452758

Update: np.show_config() output:

mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/GNU/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\lib', 'C:/GNU/Anaconda3\\Library\\include']
blas_mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/GNU/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\lib', 'C:/GNU/Anaconda3\\Library\\include']
blas_opt_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/GNU/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\lib', 'C:/GNU/Anaconda3\\Library\\include']
lapack_mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/GNU/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\lib', 'C:/GNU/Anaconda3\\Library\\include']
lapack_opt_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/GNU/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2019.0.117\\windows\\mkl\\lib', 'C:/GNU/Anaconda3\\Library\\include']
aparpara
  • 2,171
  • 8
  • 23
  • 4
    For what it's worth, in all of the above cases I'm seeing `0.7071067811865475` on x86_64 Linux with Python 3.7.3rc1 and 2.7.16rc1. – gspr Mar 25 '19 at 15:30
  • I had a short look in numpy source, looks like numpy sine just wraps up a usual sine from C. – sooobus Mar 25 '19 at 15:33
  • @gspr, what about timing on your system? – aparpara Mar 25 '19 at 15:47
  • 1
    Can reproduce with CPython 3.6.7 and NumPy 1.15.4 Anaconda (w/ MKL) in Windows 10 64-bit. – jdehesa Mar 25 '19 at 16:03
  • 1
    @aparpara: This is embarassing, but I actually don't know how to run the timeit stuff. I'll give you the numbers if you give me a hint :-) – gspr Mar 25 '19 at 16:04
  • I imagine this has something to do with 8192 being the value of the constant [`NPY_BUFSIZE`](https://github.com/numpy/numpy/blob/v1.16.2/numpy/core/include/numpy/ndarraytypes.h#L930) in the source code (something like "if everything fits into one buffer go down this path, else do this"), but I don't really know how. – jdehesa Mar 25 '19 at 16:12
  • 1
    @gspr, %timeit is a IPython magic. If you don't use it, you can use ```python from timeit import timeit for n in range(8191, 8195): print(timeit("np.sin(np.repeat(x, n))", setup="from __main__ import np, x, n", number=10000)) print(n, np.sin(np.repeat(x, n))[0]) ``` – aparpara Mar 25 '19 at 16:27
  • @jdehesa I'd actually think it's something to do with the MSVC implementation of `std::sin` based on what @aparpara said at the end of their post and the fact that this behavior doesn't happen according to others that have tried replicating this on Linux. – Chi Mar 25 '19 at 16:30
  • Can you add the output of `np.show_config()` to the question? – Warren Weckesser Mar 25 '19 at 16:54
  • @warren-weckesser Added. – aparpara Mar 25 '19 at 17:15
  • I can not reproduce with Python 2.7.15 and Numpy 1.15.3 on MacOS. – Keldorn Mar 25 '19 at 17:37
  • You are timing the `repeat` and the `sin `. Separate the calls to ensure you are only timing the `sin` call. – Jan Christoph Terasa Mar 25 '19 at 18:06
  • @JanChristophTerasa, thank you, good point, now the difference is more than 3 times :) – aparpara Mar 25 '19 at 18:46
  • And what is your Numpy/Python version? Cannot reproduce it on windows with numpy 1.14.2 and Python3.6. – ead Mar 25 '19 at 20:24
  • 5
    It's almost certainly an Anaconda & Intel MKL issue; cf. https://github.com/numpy/numpy/issues/11448 and https://github.com/ContinuumIO/anaconda-issues/issues/9129. – Warren Weckesser Mar 26 '19 at 03:40
  • @aparpara: I did the timing now too, and consistently get 57-58 µs per loop. So in summary, I'm not able to reproduce: I consistently see the same output (`…475`) and the same timing for each `n` on x86-64 Linux with Python 3.7.3rc1 and NumPy 1.16.1. – gspr Mar 26 '19 at 08:13
  • @WarrenWeckesser seems to be right. The bug report he linked says that for transcendental functions like `sin` and `sqrt`, Anaconda will use normal math functions under 8192 elements, and MKL functions over 8192. – interjay Mar 26 '19 at 09:20
  • @interjay: Yes, looks like the problem is in MKL. But `sqrt` is not a transcendental function, it is algebraic, and IEEE 754-2008 requires it to be correctly rounded. So MKL does not conform to the standard here. And the only way to solve the issue on Windows is to uninstall Anaconda: https://stackoverflow.com/questions/46656367/how-to-create-an-environment-in-anaconda-with-numpy-nomkl – aparpara Mar 26 '19 at 10:04
  • A comment in the bug report says "sqrt is under the transcendental group", so it looks like Anaconda considers it to be one even if it technically isn't. – interjay Mar 26 '19 at 10:19
  • @aparpara It is quite normal that different implementations of functions like sin,sqrt,.. leads to different results. If `xx.shape[0]>=8193` Anaconda-numpy on windows not only uses another sin implementation (maybe the Intel SVML 1UP version), but also uses a multithreaded approach. – max9111 Mar 27 '19 at 16:31
  • I tried 3.7.2 and a machine-specific 1.16.2 numpy build (using the most recent icc), no big difference in timings, same result everywhere. @max9111 if you follow the standards you should have exactly the same answer, regardless of implementation. – Bracula Mar 28 '19 at 18:32

1 Answers1

3

As @WarrenWeckesser wrote, "it's almost certainly an Anaconda & Intel MKL issue; cf. https://github.com/numpy/numpy/issues/11448 and https://github.com/ContinuumIO/anaconda-issues/issues/9129".

And unfortunately, the only way to solve the issue under Windows is to uninstall Anaconda and use another distribution with MKL-free numpy. I used python-3.6.6-amd64 from https://www.python.org/ and installed everything else via pip, including numpy 1.14.5. I even managed to make Spyder work (had to downgrade PyQt5 to 5.11.3, it refused to launch on >= 5.12).

Now np.sin(xx) is consistently 0.7071067811865476 (67.1 µs at n = 8192) and np.sqrt(xx) 0.8862269254527579 (16.4 µs). A bit slower, but perfectly reproducible.

Beware Anaconda

aparpara
  • 2,171
  • 8
  • 23