Open
Description
Calling threadpool_limits
in a sub-process fails (hangs) on some of my servers but fails in ones with a specific OS version:
$ hostnamectl
Static hostname: n86.my.domain
Icon name: computer-server
Chassis: server
Machine ID: 196b497eccff4526a8e34834c95e3de5
Boot ID: b8318d26cd394a85b706beb2d7324f73
Operating System: AlmaLinux 8.9 (Midnight Oncilla)
CPE OS Name: cpe:/o:almalinux:almalinux:8::baseos
Kernel: Linux 4.18.0-513.18.2.el8_9.x86_64
Architecture: x86-64
The code is:
import os
import sys
from threadpoolctl import threadpool_limits
from multiprocessing import get_context
def eprintln(text):
print(text, file=sys.stderr, flush=True)
DID_THREADCTL_FOR_PID = None
def invocation(index: int) -> int:
global DID_THREADCTL_FOR_PID
if os.getpid() != DID_THREADCTL_FOR_PID:
DID_THREADCTL_FOR_PID = os.getpid()
eprintln(f"PID: {os.getpid()} invocation index: {index} Do threadpool_limits...")
threadpool_limits(limits=1)
eprintln(f"PID: {os.getpid()} invocation index: {index} Did threadpool_limits.")
else:
eprintln(f"PID: {os.getpid()} invocation index: {index} Old threadpool_limits.")
return index
invocations = 4
processes = 2
threadpool_limits(limits=processes)
results = [None] * invocations
eprintln(f"PID: {os.getpid()} Do imap...")
with get_context("fork").Pool(2) as pool:
for index in pool.imap_unordered(invocation, range(invocations)):
results[index] = index
eprintln(f"PID: {os.getpid()} - Did imap index: {index}")
eprintln(f"PID: {os.getpid()} Did imap results: {results}")
assert results == list(range(len(results)))
When I run it on the above OS, in Python 3.12.3, threadpoolctl version 3.4.0, I get:
$ python3 bug.py
PID: 1576849 Do imap...
PID: 1576852 invocation index: 0 Do threadpool_limits...
PID: 1576853 invocation index: 1 Do threadpool_limits...
PID: 1576853 invocation index: 1 Did threadpool_limits.
PID: 1576853 invocation index: 2 Old threadpool_limits.
PID: 1576853 invocation index: 3 Old threadpool_limits.
PID: 1576849 - Did imap index: 1
PID: 1576849 - Did imap index: 2
PID: 1576849 - Did imap index: 3
And the process hangs. Poking around it seems that libc.dl_iterate_phdr does
not return (each match_library_callback
call does return). I am using a Python 3.12.3 that was compiled from source on this OS, followed by pip
installation of numpy
, pandas
, scipy
etc.
This same thing works fine in older versions of the OS. E.g., in:
$ hostnamectl
Static hostname: n97.my.domain
Icon name: computer-server
Chassis: server
Machine ID: 5e543d50691943628e8e20441f502406
Boot ID: 0d876250c0ec4a149e8bdb12c99c20eb
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-1160.15.2.el7.x86_64
Architecture: x86-64
With Python version 3.12.2, again with threadpoolctl version 3.4.0, I get the expected output:
$ python3 bug.py
PID: 32872 Do imap...
PID: 32874 invocation index: 0 Do threadpool_limits...
PID: 32875 invocation index: 1 Do threadpool_limits...
PID: 32874 invocation index: 0 Did threadpool_limits.
PID: 32875 invocation index: 1 Did threadpool_limits.
PID: 32874 invocation index: 2 Old threadpool_limits.
PID: 32872 - Did imap index: 0
PID: 32875 invocation index: 3 Old threadpool_limits.
PID: 32872 - Did imap index: 1
PID: 32872 - Did imap index: 2
PID: 32872 - Did imap index: 3
PID: 32872 Did imap results: [0, 1, 2, 3]
Any ideas on what I can do to fix this?
Activity
ogrisel commentedon Mar 11, 2025
I cannot reproduce on macOS.
Have you tried with other start methods (e.g.
"forkserver"
or"spawn"
instead of"fork"
)?I would also be curious to see if you can reproduce with
loky.get_reusable_executor()
instead of a multiprocessingPool
instance.ogrisel commentedon Mar 11, 2025
BTW, calling system calls after a fork is not POSIX-compliant, so it's expected that this can deadlock. I would therefore recommend not using the
"fork"
start method and use one of the alternatives suggested above.ogrisel commentedon Mar 11, 2025
If you want to try to debug the root cause, you might want to enable faulthandler in your workers:
ogrisel commentedon Mar 11, 2025
But it's very likely that the deadlock happens in the threadpool management code of one of your native libraries, in which case gdb or similar will be required to dig out where the deadlock happens.
orenbenkiki commentedon Mar 11, 2025
What eventually solved this for me was the realization by one of our team that the call to
threadpool_limits
isn't thread-safe. It took "a certain kind of mind" to even consider that a function with "thread" in its name isn't thread safe :-) Wrapping it in a global mutex solved the problem. The internal race condition seems to be a hit-or-miss thing depending on the specifics of the OS, versions of libraries, and whether Mercury is in retrograde, but once we added the global mutex wrapper we haven't seen any more crashes.A fix would be to incorporate such a global mutex at the very start of the function - any reason not to?