0

I am currently working on a Python code and to gain some speed I used f2py to port some existing Fortran code. Everything works well and the speedup is amazing. However, I found that the code seems to run on multiple threads now (according to htop), which is something I did not specify anywhere (maybe this is done intrinsically by f2py?).

Here's the command I use to create the module:

f2py --f90exec="gfortran" --f90flags="" --noopt \
$(ACMLLIB) $(FFTLIB) $(ACMLINC) $(FFTINC) -c -m fmod myCode.f90

where the variables $(ACMLLIB) $(FFTLIB) $(ACMLINC) and $(FFTINC) are paths to the libraries.

It looks like when I run the script, that it takes all the cores it can find. I don't have a problem that it does that, but I want to at least be able to control it - how can I do this by, e.g. setting the number of threads?

I suspect, this has something to do with the -pthread option here:

....

compiling C sources

C compiler: x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC

....

This is a piece of the massive output after I compile the Fortran module. I have no idea how to handle this.

Community
  • 1
  • 1
rammelmueller
  • 1,092
  • 1
  • 14
  • 29
  • What does produce the threads? The Python part? The Fortran part? What exactly is run in parallel? It should not happen on it's own, pleaso shownsome code. – Vladimir F Героям слава Mar 22 '17 at 07:26
  • I don't know what exactly produces the parallelization and/or what is run in parallel. When I open htop in the terminal and then run the code, it shows that all 4 threads of my machine are occupied. How can I check which part is run parallel? – rammelmueller Mar 22 '17 at 07:32
  • By investigating. Sorry, can't say more without seeing anything. – Vladimir F Героям слава Mar 22 '17 at 07:34
  • All the code won't fit here. Which part would you need to see? What I do is, I take the existing pure Python version (which only shows activity on one thread) and replace one function with a Fortran version. I get great speedup and the results are exactly the same, but I see activity on 4 threads. No openMP was used (and no flags either). – rammelmueller Mar 22 '17 at 07:44
  • From what I see, it **must** be the fortran part. – rammelmueller Mar 22 '17 at 08:19

2 Answers2

3

ACML, the (now end-of-lifed) math library by AMD can use multiple core, see http://developer.amd.com/tools-and-sdks/archive/compute/amd-core-math-library-acml/acml-product-features/

This is most probably why you see. There is a copy of the docs here: https://engineering.ucsb.edu/~stefan/acml.pdf where the use of the environment variable OMP_NUM_THREADS is mentioned to control the number of cores/threads to use. That is the standard OpenMP environment variable.

Pierre de Buyl
  • 7,074
  • 2
  • 16
  • 22
  • yeah, I use ACML (plan to change that in the future, but this will do for now..) but I didn't specify any OMP stuff explicitly in my code. Is this done intrinsically by the ACML routines? – rammelmueller Mar 22 '17 at 08:52
  • Yes, it can do it on its own. Very plausible explanation. – Vladimir F Героям слава Mar 22 '17 at 08:59
  • Setting the OMP_NUM_THREADS and compiling with that doesn't change anything though. – rammelmueller Mar 22 '17 at 09:02
  • Also, when I take out all the ACML routines and compile without linking the library, it still seems to be running on multiple cores. – rammelmueller Mar 22 '17 at 09:08
  • To my knowledge, gfortran will not use multithreaded operations unless being asked. Python will definitely not. At this point, it is possible that the issue is related to information that we cannot guess from your full SciPy/NumPy/build system/platform information. – Pierre de Buyl Mar 22 '17 at 09:20
  • BTW, `OMP_NUM_THREADS` needs to be set for the execution, not the compilation. Either `export OMP_NUM_THREADS=1` then execute or `OMP_NUM_THREADS=1 python my_code.py` – Pierre de Buyl Mar 22 '17 at 09:21
  • Yeah, I set everything at compilation and also runtime. Doesn't seem to change anything. I work on Ubuntu 16.04 (4.4.0-66-generic), use numpy v1.11.0, f2py v2, Python 2.7.12 and used gfortran v5.4.0 20160609 to compile the Fortran module. Is there more you need to know? – rammelmueller Mar 22 '17 at 09:29
  • What install of NumPy? MKL-based NumPy can do multithreading for the linear algebra. – Pierre de Buyl Mar 22 '17 at 09:30
  • Also, what are the FFT libs? What is your code spending most of its time on? – Pierre de Buyl Mar 22 '17 at 09:30
  • the FFT is fftw3, as far as I know I don't use any MKL-based Numpy in this case I would see the same behavior in the pure Python version, right? Also, I tried running it without the FFT calls, and still multiple cores show activity. – rammelmueller Mar 22 '17 at 09:34
  • 1
    I am out of guesses, sorry. Without seeing code I won't be able to help further. – Pierre de Buyl Mar 22 '17 at 09:36
  • Fair enough. I kind of don't want to post the code in public, the Fortran part is done by someone else, don't know how they would feel about it. Thanks for the effort though! I'll (hopefully) be back with an answer. – rammelmueller Mar 22 '17 at 09:38
  • As a matter of fact, the ACML seems to be the problem. I switched to just LAPACK and it seems to be alright now. Thanks again. – rammelmueller Mar 22 '17 at 09:52
  • FFTW3 can use multiple threads on its own as well. Depends on how you are initializing it. – Vladimir F Героям слава Mar 22 '17 at 10:26
1

It would be nice to be able to set the number of f2py threads via an environment variable or something. I searched around a bit, but could not find any info about doing that.

Howver, if you're running on linux, say, you can use taskset command-line utility, which provides a way to pin your process (any process) to a particular cpu core or set of cpu cores. This is a bit crude, but I think it will accomplish what you need.

For more info, look here, for instance: http://xmodulo.com/run-program-process-specific-cpu-cores-linux.html

Alex L
  • 1,114
  • 8
  • 11
  • OK, I'll have a look at that.. Thanks! It is annoying though, that this can't be controlled.. – rammelmueller Mar 22 '17 at 06:26
  • I'm not saying that it can't for sure--just that I wasn't able to find how to do it. If there's an f2py mailing list or something, that might be another good place to ask. – Alex L Mar 22 '17 at 16:42