Poor openmp scaling with ifort but not gfortran

JohnCampbell · December 16, 2021, 12:49am

For a few years now, I have been trying to understand OpenMP performance, especially working with large arrays.
Two frequent problems with testing OpenMP implementations are:

insufficient workload for the OpenMP DO loop; a trivial calculation in the loop is not going to overcome the overhead of initiating the !$OMP region. It takes about 5 micro seconds to initiate a region. That is about 20,000 processor cycles, which looks huge to me. There is also a slight overhead for SCHEDULE(DYNAMIC) vs SCHEDULE(STATIC). Dynamic can be preferred where the thread workloads can be variable. The DYNAMIC overhead can be a minor issue but does highlight the problem of balancing workload between threads.
Increased thread counts can involve increased memory demand. When the memory demand of the combined thread calculation exceeds the cache size, memory access can quickly exceed the memory bandwidth, stalling the thread gains. This appears to be a black art that I am yet to master. A simple OpenMP example is dot_product. Looks good, but to scale up to overcome the startup delay, it will always fail on memory bandwidth. There might be a sweet spot for array size, but my real problems never have that characteristic.

Minor speed differences between gFortran and iFort may come down to optimisation strategies, especially for IF usage or possibly positioning for use of L1 cache.
The more important question should be is OpenMP providing a significant improvement from the single thread case. Hopefully this is a more significant gain than between gFortran and iFort. Where OpenMP is not providing a gain, this is a more challenging problem.
As I am using gFortran for OpenMP, it is good to know both compilers are sharing the better performance for different calculations.

Topic		Replies	Views
Equivalent gfort and ifort compilation Help	10	4224	April 7, 2022
Learning coarrays, collective subroutines and other parallel features of Modern Fortran Help	48	2418	May 11, 2021
Why the performance is poorer after using OpenMP? Help	20	5738	June 2, 2022
Slow thread creation with nested loops in GFortran Help	5	704	January 21, 2023
OpenMP and FORTRAN Help	32	1047	December 9, 2024

Poor openmp scaling with ifort but not gfortran

Related topics