I don’t know if it’s the actual source code you are compiling, but it misses the $
in all the OpenMP directives. Assuming it was intended (just to run the code sequentially), setting the right !$omp
was not enough, since you are opening a workshare section without closing the previous one (compilation error in both gfortran and ifort).
Anyway, I got this compilable code:
!$omp parallel
!$omp workshare
pderiv = diffw * pwest + &
diffe * peast + &
diffn * pnorth + &
diffs * psouth &
- (diffw + diffe + diffn + diffs) * pcentre + pforce
u = u + deltt * du
!$omp end workshare
!$omp end parallel
But the execution fails (segmentation violation) with both gfortran and ifort (on Linux). This is probably because each thread tries allocating a big temporary array in its own stack space. Assigning the result to an allocatable tmp(:,:)
array instead of the pointer pderiv(:,:)
one (and then copying to pderiv
) fixes the problem. The very same problem actually happens with gfortran even without any OpenMP in the code. So at the end what I’m testing is rather:
real, allocatable :: tmp(:,:)
...
!$omp parallel
!$omp workshare
tmp = diffw * pwest + &
diffe * peast + &
diffn * pnorth + &
diffs * psouth &
- (diffw + diffe + diffn + diffs) * pcentre + pforce
pderiv = tmp
u = u + deltt * du
!$omp end workshare
!$omp end parallel
gfortran, no OpenMP: 10.0" (1000 iterations max)
ifort, no OpenMP: 10.3"
gfortran, OpenMP workshare (1 thread): 15.6"
ifort, OpenMP workshare (1 thread): 10.8"
gfortran, OpenMP workshare (4 threads): 15.4"
ifort, OpenMP workshare (4 threads): 9.6"
And finally with classical $omp parallel do
:
gfortran, OpenMP do loops (4 threads): 5.8"
ifort, OpenMP do loops (4 threads): 4.2"
So yes, workshare
doesn’t speed-up anything here.