I was comparing the performance of doing a sum followed by an assignment of two arrays, in the form of c=a+b
, between a native Fortran type, real
, and a derived data type that only contains one array of real
. The class is very simple: it contains operators for addition and assignment and a destructor, as follows:
module type_mod
use iso_fortran_env
type :: class_t
real(8), dimension(:,:), allocatable :: a
contains
procedure :: assign_type
generic, public :: assignment(=) => assign_type
procedure :: sum_type
generic :: operator(+) => sum_type
final :: destroy
end type class_t
contains
subroutine assign_type(lhs, rhs)
class(class_t), intent(inout) :: lhs
type(class_t), intent(in) :: rhs
lhs % a = rhs % a
end subroutine assign_type
subroutine destroy(this)
type(class_t), intent(inout) :: this
if (allocated(this % a)) deallocate(this % a)
end subroutine destroy
function sum_type (lhs, rhs) result(res)
class(class_t), intent(in) :: lhs
type(class_t), intent(in) :: rhs
type(class_t) :: res
res % a = lhs % a + rhs % a
end function sum_type
end module type_mod
The assign
subroutine contains different modes of operations, just for the sake of benchmarking.
To test it against performing the same operations on a real
I created the following module
module subroutine_mod
use type_mod, only: class_t
contains
subroutine sum_real(a, b, c)
real(8), dimension(:,:), intent(inout) :: a, b, c
c = a + b
end subroutine sum_real
subroutine sum_type(a, b, c)
type(class_t), intent(inout) :: a, b, c
c = a + b
end subroutine sum_type
end module subroutine_mod
Everything is executed in the program below, considering arrays of size (10000,10000) and repeating the operation 100 times:
program test
use subroutine_mod
integer :: i
integer :: N = 100 ! Number of times to repeat the assign
integer :: M = 10000 ! Size of the arrays
real(8) :: tf, ts
real(8), dimension(:,:), allocatable :: a, b, c
type(class_t) :: a2, b2, c2
allocate(a2%a(M,M), b2%a(M,M), c2%a(M,M))
a2%a = 1.0d0
b2%a = 2.0d0
c2%a = 3.0d0
allocate(a(M,M), b(M,M), c(M,M))
a = 1.0d0
b = 2.0d0
c = 3.0d0
! Benchmark timing with
call cpu_time(ts)
do i = 1, N
call sum_type(a2, b2, c2)
end do
call cpu_time(tf)
write(*,*) "Type : ", tf-ts
call cpu_time(ts)
do i = 1, N
call sum_real(a, b, c)
end do
call cpu_time(tf)
write(*,*) "Real : ", tf-ts
end program test
To my surprise, the operation with my derived datatype consistently underperformed the operation with the Fortran arrays by a factor of 2 with gfortran
and a factor of 10 with ifort
. For instance, using the CHECK_SIZE
mode, which saves allocation time, I got the following timings compiling with the -O2
flag:
gfortran
- Data type: 33 s
- Real : 13 s
ifort
- Data type: 30 s
- Real : 3 s
Question
Is this normal behaviour? If so, are there any recommendations to achieve better performance?
Context
To provide some context, the type with a single array will be very useful for a code refactoring task, where we need to keep similar interfaces to a previous type.
Compiler versions
gfortran
9.4.0ifort
2021.6.0 20220226