Poor openmp scaling with ifort but not gfortran

MarDie · December 23, 2021, 9:01am

I can confirm the findings using ifort (IFORT) 19.1.2.254 20200623 and gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1~20.04) on an Intel(R) Xeon(R) CPU E5-2687W 0

maws01 ➜  gfortran -Ofast -march=native -fopenmp *.f90 
maws01 ➜  ./a.out
 Calling parallel marbles with            1  threads.
 Loop time = 3.174000 seconds.
 Speedup = 1.000000x.
 ------------------------------------------------------
 Calling parallel marbles with            4  threads.
 Loop time = 0.818000 seconds.
 Speedup = 3.880196x.
 ------------------------------------------------------

maws01 ➜  ifort -fast -xHost -qopenmp *.f90
ld: /opt/intel/compilers_and_libraries_2020/linux/lib/intel64/libiomp5.a(ompt-general.o): in function `ompt_pre_init':
(.text+0x2281): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
maws01 ➜  ./a.out 
 Calling parallel marbles with            1  threads.
 Loop time = 4.291700 seconds.
 Speedup = 1.000000x.
 ------------------------------------------------------
 Calling parallel marbles with            4  threads.
 Loop time = 41.886398 seconds.
 Speedup = 0.102460x.
 ------------------------------------------------------

The poor performance is directly related to

call parser%evaluate(marble(1:3), marble(4:6))

replacing this with

marble(1:3) = evaluate(marble(1:3) , marble(4:6))

and

elemental function evaluate(a,b) result(c)                                                          
  use iso_fortran_env                                                                               
  real(real64), intent(in) :: a,b                                                                   
  real(real64) :: c                                                                                 
                                                                                                    
  c = (a*b)**2                                                                                      
end function

gives a near optimal speedup for gfortran and ifort.

I must admit that I did not look into parser%evaluate in detail, but it seems quite complex with many branches, testing for the status of allocated arrays etc. Such things should be avoided in a hot loop.

Topic		Replies	Views
Equivalent gfort and ifort compilation Help	10	4225	April 7, 2022
Learning coarrays, collective subroutines and other parallel features of Modern Fortran Help	48	2418	May 11, 2021
Why the performance is poorer after using OpenMP? Help	20	5746	June 2, 2022
Slow thread creation with nested loops in GFortran Help	5	704	January 21, 2023
OpenMP and FORTRAN Help	32	1047	December 9, 2024

Poor openmp scaling with ifort but not gfortran

Related topics