CPU utilisation drop when increasing number of threads with threading

@colesbury

Bug report

Bug description:

Hi :)
This issue is essentially a re-open of #118153, but with some more experiments added.

I'm testing the free-threaded Python build. I'm running a simple test (code below), which triggers a computationally heavy function across CPU cores using threading module. Time measurements of the script are the following:

num_threads = 2  -> 0.68 s
num_threads = 8  -> 5.48 s
num_threads = 18 -> 12.1 s

In ideal world, I believe I could expect the three numbers above to be the same (or comparable). I've also gathered the profiles of the experiment:

num_threads = 2

num_threads = 8

num_threads = 18 (showing only some threads, but the picture illustrates the issue)

As we can see, the CPU utilisation decreases with the number of CPU threads used (almost 99% for nt=2, about 75% for nt=8 and ~40% for nt=18). We also see increased CPU core switching frequency. My guess is that the reason of decreased CPU utilisation is the overhead on the threading module.

Running some further experiments, I've run the program on a high number of threads (thus according to previous observation, the CPU utilisation should be low), but with both busy and idle wait on 10 threads and actual sin*cos computation on 2 threads. In both of these scenarios, we observe high CPU utilisation on worker threads:

idle wait (implemented with time.sleep)

busy wait (implemented with while loop)

Interestingly, zooming-in to be CPU utilisation profile (the "slow" case) we do see that there are parts in the timeline, where CPU is saturated and all threads are working in parallel. However, there are also periods, where CPU utilisation is scattered:

Lastly, as a sanity check, the same operation implemented in C++:

May this be a bug inside threading module? I've went through PEP 703, but I've seen no mention about this part. If the overhead on threading is the root cause of lowered utilisation, may this issue be addressed?

@colesbury , tagging you here since I believe you'd know most about the free-threaded Python build. Should this issue be added to the list in #108219?

Testing configuration:

Ubuntu 22.04
Python 3.13 ToT
scaling_governor - performance

CPU(s):                  36
  On-line CPU(s) list:   0-35
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz

CPython build command:

./configure --disable-gil --enable-optimizations && make -j && make install

Testing script:

import math
import time
import nvtx
import threading

def computational_heavy(iterations):
    val = 0
    for i in range(1, iterations):
        val += math.sin(i) * math.cos(i)
    return val


def test(thread_id, iterations=1000000):
    with nvtx.annotate("Calculation"):
        computational_heavy(iterations)

num_threads = 18

threads = [
    threading.Thread(target=test, name=f"Thread{i}", args=(i,))
    for i in range(num_threads)
]
start = time.perf_counter_ns()
for t in threads:
    t.start()
for t in threads:
    t.join()
stop = time.perf_counter_ns()
print(f"Elapsed time {stop-start} ns")

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CPU utilisation drop when increasing number of threads with `threading` #118649

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

CPU utilisation drop when increasing number of threads with threading #118649

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

CPU utilisation drop when increasing number of threads with `threading` #118649