28

Are parts of NumPy and/or SciPy programmed in C/C++?

And how does the overhead of calling C from Python compare to the overhead of calling C from Java and/or C#?

I'm just wondering if Python is a better option than Java or C# for scientific apps.

If I look at the shootouts, Python loses by a huge margin. But I guess this is because they don't use 3rd-party libraries in those benchmarks.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
  • 5
    The shootout python code (e.g. http://shootout.alioth.debian.org/u32/benchmark.php?test=regexdna&lang=python&id=1) does not use numpy/scipy. – unutbu Dec 01 '09 at 12:47
  • 2
    Don't forget about Fortran. Python plays nicely with Fortran too – John La Rooy Dec 01 '09 at 12:48
  • @~unutbu It's kind-of puzzling that you would expect the regex-dna program to use numpy. – igouy Dec 01 '09 at 15:55
  • If you look closer you'll find an "interesting alternative" Python program that does use numpy http://shootout.alioth.debian.org/u32/benchmark.php?test=spectralnorm&lang=python&id=2 – igouy Dec 01 '09 at 15:56

5 Answers5

19
  1. I would question any benchmark which doesn't show the source for each implementation (or did I miss something)? It's entirely possible that either or both of those solutions are coded badly which would result in an unfair appraisal of either or both language's performance. [Edit] Oops, now I see the source. As others have pointed out though, it's not using the NumPy/SciPy libraries so those benchmarks are not going to help you make a decision.
  2. I believe the vast majority of NumPy and SciPy is written in C and wrapped in Python for ease of use.
  3. It probably depends what you're doing in any of those languages as to how much overhead there is for a particular application.

I've used Python for data processing and analysis for a couple of years now so I would say it's certainly fit for purpose.

What are you trying to achieve at the end of the day? If you want a fast way to develop readable code, Python is an excellent option and certainly fast enough for a first stab at whatever it is you're trying to solve.

Why not have a bash at each for a small subset of your problem and benchmark the results in terms of development time and run time? Then you can make an objective decision based on some relevant data ...or at least that's what I'd do :-)

Jon Cage
  • 36,366
  • 38
  • 137
  • 215
  • The source code is available by navigating to a specific program. Scroll down to the bottom and click on one of the "Python CPython" links. An example: http://shootout.alioth.debian.org/u32/benchmark.php?test=mandelbrot&lang=python&id=5 –  Dec 01 '09 at 12:45
  • 1
    +1 for now. After downloading the NumPy source code I can confirm it is mostly C wrapped in Python. –  Dec 01 '09 at 13:43
  • By "for now" I mean it's an excellent answer I'll accept it if no-one produces a good comparison of different costs for C interop in Python, Java and C#. Also, I'll follow your advice and prototype a part of the app in all 3 languages. –  Dec 01 '09 at 13:46
  • "or did I miss something" Put your [Edit] at the top where everyone will read your mistake. Out of curiosity, did you look at more than that one page you were referred to? – igouy Dec 01 '09 at 15:58
8

There is a better comparison here (not a benchmark but shows ways of speeding up Python). NumPy is mostly written in C. The main advantage of Python is that there are a number of ways of very easily extending your code with C (ctypes, swig,f2py) / C++ (boost.python, weave.inline, weave.blitz) / Fortran (f2py) - or even just by adding type annotations to Python so it can be processed to C (cython). I don't think there are many things comparably easy for C# or Java - at least that so seemlessly handle passing numerical arrays of different types (although I guess proponents would argue since they don't have the performance penalty of Python there is less need to).

robince
  • 10,826
  • 3
  • 35
  • 48
5

A lot of it is written in C or fortran. You can re-write the hot loops in C (or use one of the gazillion ways to speed python up, boost/weave is my favorite), but does it really matter?

Your scientific app will be run once. The rest is just debugging and development, and those can be much quicker on Python.

wisty
  • 6,981
  • 1
  • 30
  • 29
  • 1
    really - you should jus ttry it: use Python Numeric from a Python interactuive console to create some matrices,and make some operatins with them "live". -- It gives you an ease of use and flexibility that goes unsurpassed in other tools - which sppeds up any development as new ideas and usage patterns can be tried right away. The SciPy interactive prompt is oftenly used as an alternative to MatLab and other expensive (and somehow limited) scientific tools. – jsbueno Dec 01 '09 at 14:22
  • 2
    "Your scientific app will be run once. The rest is just debugging and development, and those can be much quicker on Python." -- Normally I'd agree. But this app could run for days or even weeks, so cutting back just a little bit on processing time will save a lot of real time. It will be run more than once. –  Dec 03 '09 at 08:08
5

Most of NumPy is in C, but a large portion of the C code is "boilerplate" to handle all the dirty details of the Python/C interface. I think the ratio C vs. Python is around 50/50 ATM for NumPy.

I am not too familiar with vm-based low-level details, but I believe the interface cost would be higher because of the restrictions put on the jvm and the .clr. One of the reason why numpy is often faster than similar environments is the memory representation and how arrays are shared/passed between functions. Whereas most environments (Matlab and R as well I believe) use Copy-On-Write to pass arrays between functions, NumPy use references. But doing so in e.g. the JVM would be hard (because of restrictions on how to use pointer, etc...). It is doable (an early port of NumPy for Jython exists), but I don't know how they solve this issue. Maybe C++/Cli would make this easier, but I have zero experience with that environment.

David Cournapeau
  • 78,318
  • 8
  • 63
  • 70
  • @DavidCournapeaud passing an array from C# to a native dll is as easy as passing a pointer. In fact, no copy (like for Java) of the array is made. The array _is_ passed as (pinned) reference with very little overhead. – user492238 Feb 06 '12 at 18:00
  • @DavidCournapeaud Please provide references about JVM and CLR having restrictions. I developed scientific software in python and java (and others) and saw no such problem nor was them slower than numpy. Actually, it seems pretty much the opposite, since e.g. linear algebra libs are native optimized code in any decent lib and the handling of data outside native numerical calculations (loops, conditionals etc) boils down to C performance in most compiled languages (or many languages other than python, for that matter). – dawid Dec 04 '20 at 16:11
0

It always depends on your own capability to handle the langue, so the language is able to generate fast code. Out of my experience, numpy is several times slower then good .NET implementations. And I expect JAVA to be similar fast. Their optimizing JIT compilers have improved significantly over the years and produce very efficient instructions.

numpy on the other hand comes with a syntax wich is easier to use for those, which are attuned to scripting languages. But if it comes to application development, those advantages often turn to obstacles and you will yearn for typesafety and enterprise IDEs. Also, the syntactic gap is already closing with C#. A growing number of scientific libraries exist for Java and .NET.Personally I tend towards C#, bacause it provides better syntax for multidimensional arrays and somehow feels more 'modern'. But of course, this is only my personal experience.

user492238
  • 4,094
  • 1
  • 20
  • 26
  • including even a simple benchmark would justify this position more precisely – vwvan Apr 16 '18 at 23:24
  • 1
    @vwvan exactly which part / statement would you like to see backed by a benchmark? "several times slower"? "feels more modern"? I have emphasized the subjective nature of my answer. Justifying a downvote to a subjective answer on a question which cannot be answered objectively puts you motivation in a questionable light, at least. IMO. – user492238 May 16 '18 at 10:35
  • 1
    IMO too. Back to a neutral value you go, because it its a good answer. – ElDoRado1239 Nov 18 '21 at 14:18