OpenMP and `do concurrent` loop = crash at runtime

Over the last couple of days, I’ve been playing with OpenMP (compiler = gfortran 12.2.0). While doing so on a rather simple piece of code, I realized that once a given array exceed a certain size, the program would fail at runtime.

Initially, I though the issue was caused by the $omp directives, but I finally realized that even if I removed all $omp directives, the problem persisted. Ultimately, I was able to pin down the problem to an unexpected (for me) clash between OpenMP and do concurrent loops.

In my computer (windows with gcc), the simple program below runs well for “any” array size if compiled without the flag -fopenmp. In contrast, with -fopenmp it fails for N>=360 raising the following cryptic message:

<ERROR> Execution failed for object " filename.exe "
<ERROR>*cmd_run*:stopping due to failed executions
STOP 1 
program test_do_concurrent_omp_bug
 use, intrinsic :: iso_fortran_env, only: real64
 implicit none

 integer, parameter :: rk = real64
 integer, parameter :: N = 370 ! raise me until it breaks!
 real(rk) :: x(N**2)

 print *, "Running..."
 call foo(x)
 print *, "Done!"

contains

 subroutine foo(u)
    real(rk), intent(out) :: u(:)
    real(rk) :: a(0:N, N)
    integer :: i, j
    
    a = 0._rk

    do concurrent(i=1:N, j=1:N)
       u((j - 1)*N + i) =  a(i, j)
    end do

 end subroutine foo

end program test_do_concurrent_omp_bug

Has anyone observed something similar? Is this a bug or a feature? If this is a general problem, it would appear that do concurrent loops effectively preclude using OpenMP…

1 Like

I can compile and run this piece of code with N=700 and gfortran 12.2, with -O3 -fopenmp:

With N=800 or more it fails, but probably because godbolt is limiting the amount of memory used by the programs.

Note also that with the declaration real(rk) :: a(0:N, N), the allocation is possibly on the stack, which is limited. Try replacing it with an allocatable

real(rk), allocatable :: a(:,:)
allocate( a(0:N, N) )

And indeed, I wouldn’t use do concurrent together with OpenMP

This may have nothing to do with the actual code problem, but there seems to be a dimension mismatch in the toy program that was posted.

Is this a problem with OpenMP, or with do concurrent, or what?

I don’t know, and maybe there’s no problem at all. It’s just that I don’t know what the compilers are supposed to do when a do concurrent construct has a !$OMP DO directive.

but there seems to be a dimension mismatch in the toy program that was posted.

There is no dimension mismatch. The array a intentionally starts at index zero and that should not represent any issue, because the indexes never go out of bounds. The shape of the this test derives from a real program where this pattern arises. I basically stripped out as much code as I could until I was left with the bare minimum that still displays the issue.

Note that the code has no $omp directive, so for certain no $omp is being incorrectly used. Furthermore, to the best of my knowledge, there is no rule that forbids using do concurrent in code where $omp directives are used. I could imagine that nesting $omp and do concurrent could lead to trouble, but again that is not the case here.

According to Offloading - GCC Wiki (gnu.org), it seems gcc 12 does some sort of offloading. Could that be what is breaking the do concurrent?

Do concurrent tells the compiler that the iterations within the loop construct are independent. That means they can be done in any order, including simultaneously. It is the programmer’s responsibility to ensure that that is the case (along with some other constraints, such a no loop exits, no return statements, and so on). Something would be very wrong with OpenMP if the programmer were not allowed to use that construct.

OpenMP parallel do and do concurrent have similar goals but nothing ensures that they use the same underlying model. As a matter of fact, only the very latest (5.2) version of OpenMP handles the do concurrent loops, are still with restrictions:

This OpenMP API specification refers to ISO/IEC 1539-1:2018 as Fortran 2018. While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.
– …
– Locality of variables in a DO CONCURRENT construct

EDIT: oops, this is the 5.1 spec ! Looks like there’s no more restriction in 5.2: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf

I am a bit uncertain on how to proceed from here. do concurrent is a flag feature of modern fortran. OpenMP is also an important specification. It is somehow odd that they don’t get along in a robust manner…

What is the community advice on this?

  • Should I just forget about do concurrent, so that I am sure that my code can be used with OpenMP?
  • Should I report this bug? Where/how? I am not even sure this is a bug, because I can’t find any information on how gcc + do concurrent + -fopenmp is supposed to behave.

To me this looks like a memory/stack size problem. Already at N = 370, you get a warning about the potential problems with concurrent accesses (although it doesn’t break it as long as you execute serially):

$ gfortran -Wall test_do_concurrent_omp_bug.f90 
test_do_concurrent_omp_bug.f90:17:25:

   17 |     real(rk) :: a(0:N, N)
      |                         1
Warning: Array 'a' at (1) is larger than limit set by '-fmax-stack-var-size=', 
moved from stack to static storage. This makes the procedure unsafe when 
called recursively, or concurrently from multiple threads. Consider increasing 
the '-fmax-stack-var-size=' limit (or use '-frecursive', which implies unlimited 
'-fmax-stack-var-size') - or change the code to use an ALLOCATABLE array. 
If the variable is never accessed concurrently, this warning can be ignored, 
and the variable could also be declared with the SAVE attribute. [-Wsurprising]

(I have folded the message over multiple lines).

As you can see, the compiler warns you about concurrent access and just like @PierU has suggested, tells you to use dynamically allocated memory instead. The issue is also discussed here: fortran - Why Segmentation fault is happening in this openmp code? - Stack Overflow

Anyways, with the -fopenmp version, you can make the routine work by increasing the stack size. Here’s an example from my system:

$ ulimit -s
8192
$ gfortran -Wall -fcheck=all test_do_concurrent_omp_bug.f90 -fopenmp
$ ./a.out
 Running...
Segmentation fault: 11
$ ulimit -s 16384
$ ./a.out
 Running...
 Done!

If you look at the output produce compiling with -fdump-tree-original, you will notice one important change:

  1. serial, without -fopenmp
    __attribute__((fn spec (". w ")))
    void foo (struct array01_real(kind=8) & restrict u)
    {
      static logical(kind=4) is_recursive.4 = 0;
      static real(kind=8) a[640800];
      // ... truncated
    
  1. with -fopenmp

    __attribute__((fn spec (". w ")))
    void foo (struct array01_real(kind=8) & restrict u)
    {
      real(kind=8) a[640800];
      integer(kind=4) i;
    

The difference is the static attribute. By the looks of it, in the -fopenmp version, the array a is an automatic variable, since each thread might need it’s own copy (the procedure should be re-entrant).

For an explanation of the stack, I would suggest reading this post by @dwwork: Why stack is faster than heap and what exactly is stack? - #10 by dwwork

What do you mean by “handles”?

The 5.2 spec defines a Fortran canonical loop nest form which must match the grammar:

DO [ label ] var = lb , ub [ , incr ]
[intervening-code]
loop-body
[intervening-code]
[ label ] END DO

Such a loop-nest can then appear in a work-sharing loop construct:

!$omp do [clause[ [,] clause] ... ] 
   loop-nest 
[!$omp end do [nowait]]

It doesn’t appear possible to combine the two:

!$omp do               ! not allowed
do concurrent(i=1:n) 
   ! loop body
end do

The two concepts can be used independently, according to the following rules/restrictions

  • An OpenMP directive may not appear in a do concurrent construct
  • In an OpenMP data environment, do concurrent loop indices are private
  • The effect of an access to a threadprivate variable in a DO CONCURRENT construct is unspecified.
  • OpenMP runtime library routines may not be called in DO CONCURRENT constructs.

I mean that an !$OMP DO directive can be set upfront a do concurrent construct. You say that it’s not possible even in OpenMP 5.2, but I’m not sure I agree. While the OpenMP specification was clearly stating the “[the use to DO CONCURRENT] may result in unspecified behavior” until the 5.0 version, this warning has completely disappeared in the 5.2 version.

However, assuming it’s possible, it’s really not clear how they interact.

Anyway, I tend to not use do concurrent at all, as I find OpenMP extremely simple, with generally efficient implementations, and much more control on the locality of variables, on the scheduling, etc…

Thanks, the explanation and the links were very useful.

I feel somewhat embarrassed for not having seen the warning. I got into the habit of compiling the code with fpm build, which by default has the option --profile debug, which in turn includes the flag -Wall. So, I assumed I would see warnings if any existed… But actually, the warning message is only displayed with the additional option --verbose.

If it were possible, I would share the solution between you and @PierU. That not being the case, I’ll assign it to you, since you were the one who ultimately showed me how to diagnose the issue.

OpenMP 6.0 now adds some sentences to specify this. It is listed among the Version 5.2 to 6.0 Differences:

(pg. 901, L25) The loop construct was extended to allow a DO CONCURRENT loop as the collapsed loop (see Section 13.8).

The normative text that allows this can be found on

(pg. 153, L24) The associated structured block of a block-associated directive can be a DO CONCURRENT loop where it is explicitly allowed.

The words explicitly allowed imply it would be listed under the text of the loop-associated directive. Indeed, under the text of the loop directive, it says

(pg 424, L1) The collapsed loop may be a DO CONCURRENT loop.

It also says,

(pg. 424, L12) If the collapsed loop is a DO CONCURRENT loop, neither the data-sharing attribute clauses nor the collapse clause may be specified.

I’m not sure how to interpret this. It seems to imply the collapse clause is not needed:

!$omp target loop
do concurrent(i = 1:N, j = 1:M)
   ! loop body
end do

This would make sense as the concurrency is already indicated in the DC loop. OpenACC follows similar behavior, see OpenACC 3.3 (pg. 57-58):

When do-loop is a do concurrent, the OpenACC loop construct applies to the loop for each
index in the concurrent-header. The loop construct can describe what type of parallelism to use
to execute all the loops, and declares all indices appearing in the concurrent-header to be implicitly
private.
[…]
A tile and collapse clause may not appear on loop that is associated with do concurrent.

W.r.t. to the “data sharing attribute clauses” mentioned in the OpenMP spec, the only ones allowed with loop (Section 7.5, page 255) are private and lastprivate. It makes sense private is forbidden, because instead the programmer should use the locality specifiers of the do concurrent loop instead, i.e. local(...), local_init(...), shared(...). For lastprivate, the restriction only affects loop iteration variables, since OpenMP requires that

A list item must not appear in a lastprivate clause unless it is the loop-iteration variable of an affected loop.

I’m not sure if this conflicts with the base Fortran rules or not. In J3/18-007r1 (page 184), it says,

For a DO CONCURRENT construct, the loop terminates, and the DO construct becomes inactive when all of the iterations have completed execution.
When a DO construct becomes inactive, the DO variable, if any, of the DO construct retains its last defined value.

Since a DO CONCURRENT construct is also a DO construct and for a DO CONCURRENT construct “the executions may occur in any order”, the values of the of the loop iteration variable could also be anything. So it makes sense that OpenMP would also disallow the lastprivate data-sharing attribute.

1 Like

Is that the case ?
I thought the loop itteration variables in DO CONCURRENT (DO-C) were local unique (ie private) variables, so not available outside the construct. In your above example, if variables i and j are defined prior to the DO-C, then they would retain their value after the DO-C

I am not a fan of DO-C for multi-threading, as OpenMP has a lot more useful controls.
OpenMP could provide a simpler interface for GPU off-loading, which DO-C appears to be providing (by default?) but is not standard conforming and differs between compilers.

Worrying, as in 70’s, all compilers appear to be different and lack documentation !

Sorry if my wording was confusing. I was merely referring to the fact that the definitions of DO, DO WHILE and DO CONCURRENT are “merged” in the standard and share some parts of the normative text:

Upon more accurate reading, I think you are right and my interpretation above is wrong. Looking at J3/24-007, 19.4, paragraphs 6 and 8, are

The name of a variable that appears as an index-name in a DO CONCURRENT construct, FORALL statement, or FORALL construct has a scope of the statement or construct. It is a scalar variable. If integer-type-spec appears in concurrent-header it has the specified type and type parameters; otherwise it has the type and type parameters that it would have if it were the name of a variable in the innermost executable construct or scoping unit that includes the DO CONCURRENT or FORALL, and this type shall be integer type. It has no other attributes. The appearance of a name as an index-name in a DO CONCURRENT construct, FORALL statement, or FORALL construct is not an implicit declaration of a variable whose scope is the scoping unit that contains the statement or construct.

If integer-type-spec does not appear in a concurrent-header, an index-name shall not be the same as an accessible global identifier, local identifier, or identifier of an outer construct entity, except for a common block name or a scalar variable name. An index-name of a contained DO CONCURRENT construct, FORALL statement, or FORALL construct shall not be the same as an index-name of any of its containing DO CONCURRENT or FORALL constructs.

I’m still left confused by what value, if any, are the indexes supposed to retain after a DO CONCURRENT loop. Can anyone provide a more definitive answer?

With OpenMP offloading, the easiest way is to use the descriptive loop construct:

!$omp target [...]
!$omp loop [...]
do i = 1, n
   ! ...
end do
!$omp end target

That’s one of the easiest ways to start using GPU offloading.

I agree. DO-C can be used to imply either SIMD vectorization, multi-threading or GPU parallelism. Control over which of these is used is exerted by compiler flags meaning one has to carefully consider how the code is packed into translation units.