Difficulty getting OpenMP to work with f2py

Question

I am working on some simulation work for my research and have run into a snag importing fortran into my python scripts. As background, I have worked with Python for some years now, and have only toyed around inside of Fortran when the need arose.

I have done some work in the past with Fortran implementing some simple OpenMP functionality. I am no expert in this, but I have gotten the basics going before.

I am now using f2py to create a library I can call on from my python script. When I try to compile openmp, it compiles correctly and runs to completion, but there is zero improvement in speed and looking at top I see that the CPU usage indicates that only one thread is running.

I've scoured the documentation for f2py (which is not very well documented) as well as done the normal web sleuthing for answers. I've included the Fortran code I am compiling as well as a simple python script that calls on it. I also am throwing in the compile command I am using.

Currently I cut down the simulation to 10^4 as a nice benchmark. On my system it takes 3 seconds to run. Ultimately I need to run a number of 10^6 particle simulations though, so I need to bring down the time a bit.

If anyone can point me in the direction of how to get my code working, it would be super appreciated. I can also try to include any detailed information about the system as needed.

Cheers, Rylkan

1) Compile

f2py -c --f90flags='-fopenmp' -lgomp -m calc_accel_jerk calc_accel_jerk.f90

2) Python script to call

import numpy as N
import calc_accel_jerk

# a is a (1e5,7) array with M,r,v information
a        = N.load('../test.npy')
a        = a[:1e4]

out = calc_accel_jerk.calc(a,a.shape[0])
print out[:10]

3) Fortran code

subroutine calc (input_array, nrow, output_array)
implicit none
!f2py threadsafe
include "omp_lib.h"

integer, intent(in) :: nrow
double precision, dimension(nrow,7), intent(in) :: input_array
double precision, dimension(nrow,2), intent(out) :: output_array

! Calculation parameters with set values
double precision,parameter :: psr_M=1.55*1.3267297e20
double precision,parameter :: G_Msun=1.3267297e20
double precision,parameter :: pc_to_m=3.08e16

! Vector declarations
integer :: irow
double precision :: vfac
double precision, dimension(nrow) :: drx,dry,drz,dvx,dvy,dvz,rmag,jfac,az,jz

! Break up the input array for faster access
double precision,dimension(nrow) :: input_M
double precision,dimension(nrow) :: input_rx
double precision,dimension(nrow) :: input_ry
double precision,dimension(nrow) :: input_rz
double precision,dimension(nrow) :: input_vx
double precision,dimension(nrow) :: input_vy
double precision,dimension(nrow) :: input_vz

input_M(:)  = input_array(:,1)*G_Msun
input_rx(:) = input_array(:,2)*pc_to_m
input_ry(:) = input_array(:,3)*pc_to_m
input_rz(:) = input_array(:,4)*pc_to_m
input_vx(:) = input_array(:,5)*1000
input_vy(:) = input_array(:,6)*1000
input_vz(:) = input_array(:,7)*1000

!$OMP PARALLEL DO private(vfac,drx,dry,drz,dvx,dvy,dvz,rmag,jfac,az,jz) shared(output_array) NUM_THREADS(2)
DO irow = 1,nrow
    ! Get the i-th iteration
    vfac  = sqrt(input_M(irow)/psr_M)
    drx   = (input_rx-input_rx(irow))
    dry   = (input_ry-input_ry(irow))
    drz   = (input_rz-input_rz(irow))
    dvx   = (input_vx-input_vx(irow)*vfac)
    dvy   = (input_vy-input_vy(irow)*vfac)
    dvz   = (input_vz-input_vz(irow)*vfac)
    rmag  = sqrt(drx**2+dry**2+drz**2)
    jfac  = -3*drz/(drx**2+dry**2+drz**2)

    ! Calculate the acceleration and jerk
    az = input_M*(drz/rmag**3)
    jz = (input_M/rmag**3)*((dvx*drx*jfac)+(dvy*dry*jfac)+(dvz+dvz*drz*jfac))

    ! Remove bad index
    az(irow) = 0
    jz(irow) = 0

    output_array(irow,1) = sum(az)
    output_array(irow,2) = sum(jz)
END DO
!$OMP END PARALLEL DO

END subroutine calc

You can control the number of threads by the environment variable OMP_NUM_THREADS, and inside your code you can check how many threads are available with omp_get_max_threads. — haraldkl, Sep 23 '15 at 20:25
You should be able to write `use omp_lib` instead of `include "omp_lib.h"`, ideally use `!$ use omp_lib`, to allow compilation without OpenMP support aswell. — haraldkl, Sep 23 '15 at 20:31
@haraldkl So I tested for this early on and the code does report that I am using 2 threads (in the case of the code posted. I've tried running the code with varied numbers of threads to see what changes would occur. Nothing happened.) Also, attempting to use !$use omp_lib like you mentioned doesn't work on my setup for some reason (whereas the include does work). I've run openmp on Fortran scripts without any include statement before, and added the library now in hopes it might be something oddly compiler/wrapper specific. — Rylkan Tiwaz, Sep 23 '15 at 23:54

score 5 · Answer 1 · answered Sep 23 '15 at 20:57

5

Here is a simple check to see, wether the OpenMP threads indeed are visible within the Fortran code:

module OTmod
  !$ use omp_lib
  implicit none

  public :: get_threads

contains

  function get_threads() result(nt)
    integer :: nt

    nt = 0
    !$ nt = omp_get_max_threads()

  end function get_threads

end module OTmod

Compilation:

> f2py -m OTfor --fcompiler=gfortran --f90flags='-fopenmp' -lgomp -c OTmod.f90

Execution:

> python
>>> from OTfor import otmod
>>> otmod.get_threads()
12

answered Sep 23 '15 at 20:57

haraldkl

3,809
26
44

When debugging my code I used omp_get_num_threads() and had it print the number during execution of the Fortran script. (Technically C I guess after wrapping.) It reports back the correct number despite showing no evidence of actual threading during execution. – Rylkan Tiwaz Sep 23 '15 at 23:59
1

@RylkanTiwaz Hm, then it might be a Pinning issue. Your code looks like it should benefit from multi threading, but if both threads are run on the same core, it doesn't help. Could you run this in pure Fortran, just to check, wether it work? Or on another machine? – haraldkl Sep 24 '15 at 04:49

Difficulty getting OpenMP to work with f2py

1 Answers1