OpenMP: efficiency in reduction in large array

JohnCampbell · December 13, 2022, 1:03pm

I run my compute intensive multi-thread tasks defined with OpenMP on windows with a multi-core Intel i7 or AMD Ryzen processor. They are very different to the multi-thread tasks that the operating system appears to run (as shown in task manager).
If I am running multiple threads, CPU_TIME clock accumulates the “CPU” time for all associated threads.
Typically they are running at 100% usage (doubled for hyper-threading ?), but if CPU usage is not the bottleneck, thay can run at a reduced (per core) operation. The most common cause of reduction for these processes is memory access bottleneck, which is an indication of an unsuitable demand for memory information between memory and cache; the inefficiency I referred to in my previous post.

This is a very different resource allocation problem to the days of time-sharing when there were very few cores available ( typically one! ) and many processes running. (I shut down our last Pr1me in 1992 !)
A Windows PC actually has this environment, (see Task Manager > Performance for the Process and Threads count), but these are very different to the compute intensive threads generated with OpenMP. It is a good idea to reduce your process thread count and leave 1 or 2 cores for those other mostly suspended other processes.

My comments were limited to the OpenMP style tasks and how CPU_TIME reports their usage. How complex a description do you want ?

Topic		Replies	Views
Why the performance is poorer after using OpenMP? Help	20	5739	June 2, 2022
MPI run time and arrays rank Help	21	1398	December 13, 2021
Question on OpenMP reduction	20	775	April 30, 2024
OpenMP question: private vs shared work arrays for reduction Help	8	2409	February 23, 2021
OpenMP Parallel Loop Help	27	1142	August 4, 2023

OpenMP: efficiency in reduction in large array

Related topics