0

As I want to implement Intel VML Functions from Intel MKL Library in a existing Software with more than 200 subroutines, I had performed a timing comparison test. The subroutines are written in Fortran90 and normally operate with arrays of size (10^6). I have implemented a test code for Multiplication using the do loop and the VML function vsmul(). Also i measured the timings for both do loop and VML fuctions. And results are VML function is slower than do loop.I am not sure whether my approach is correct or not.

So I want some comments on it.I read the post from other members , but there was not enough information .So i am asking it here again. I have read that Intel MKL libraries are faster, but I need to be pretty sure before changing the approach in 200 subroutines in my case.

My code is as Follows :

  PROGRAM TIME 
  IMPLICIT NONE
  INCLUDE 'mkl.fi'

  INTEGER                      :: i = 1000000, L
  REAL, DIMENSION (1000000)    :: z, y, O, a
  INTEGER                      :: t1, t2, t3, t4

  call system_clock(t1)

  call rand_ms(z,i)
  call rand_ms(y,i)

  call system_clock(t2)

  DO L=1,i
     a(L)=z(L)*y(L)
  ENDDO

! Here I am using the do loop to calculate the timing for the ! multiplication of 2 arrays

  call system_clock(t3)

  call vsmul(i,z,y,O)

  call system_clock(t4)

  PRINT *, t2-t1, t3-t2, t4-t3
  END PROGRAM TIME

! The following subroutine is meant to generate the Random numbers
! and those random numbers will be stored in an array.

subroutine rand_ms(vec,vecsiz)

  INTEGER                  :: L, vecsiz
  REAL, DIMENSION (vecsiz) :: z, vec
  INTEGER, DIMENSION (2)   :: seed = (/1,2/), k=1
  REAL                     :: num

 CALL RANDOM_SEED (PUT=seed)
  DO L = 1, vecsiz
    CALL RANDOM_NUMBER(num)
vec(L)=num  
 END DO
 end subroutine

The output is as follows :

t3-t2 (Do loop) = 8 sec
t4-t3 (VML Function) = 49 sec

  • Read the documentation for `random_number` more carefully and figure out why `call random_number(vec)` might be better than looping through the elements of the array. Then learn why you don't need to pass an array's size to a subroutine, the array 'knows' its own size and if you really need to know it can be got by executing `size(vex)`. But in this case your entire subroutine can be replaced by `call random_seed (put=[1,2]); call random_number(vec)`. As they say, Read The Fortran Manual. – High Performance Mark Oct 24 '14 at 16:25
  • In my previous comment for `vex` read `vec` and curse SO's auto-mis-correct. – High Performance Mark Oct 24 '14 at 17:28
  • VML allows the mode to be adapted, high or low accuracy. Beyond that, it might be that your arrays are too small, but I don't know the overhead of the VML functions. You can try increasing their size and see what happens. There could be cache effects, try running vsmul at the beginning of your program and then later when you time it. In general, I would recommend if you want to do any change to your code, you only do it to performance-critical parts and maybe put a wrapper routine, so you can easily switch between implementations. – steabert Oct 24 '14 at 17:56
  • 1
    You may also be interested in timing `a = z*y` and let the compiler take care of generating the loops. It's probably a bit slower than the loop version (it was last time I measured it about a year ago with the Intel compiler but these things change as compilers are 'upgraded'). I'd also try using the Fortran 90 interface to `vsmul` (ie `use mkl_vml` (if that's the module name)) just in case the extra information available to the compiler allows it to generate faster code. It probably won't make a blind bit of difference, but you'd better check. – High Performance Mark Oct 24 '14 at 19:37
  • @Vladimir F : My question is why the vML function takes more time as compared to the do loop . – user3759985 Oct 28 '14 at 15:53
  • @HighPerformanceMark : I tried using that too but still it doesnot works. – user3759985 Nov 07 '14 at 13:36

0 Answers0