Description
The threadpool_limits
are global. This makes it difficult to avoid oversubscription when invoking parallel operations (e.g., Numpy functions) from within a parallel divide-and-conquer algorithm.
Ideally, parallel multi-threading frameworks would be fully multi-threaded-aware, that is, have a limit on the total number of threads used, regardless of how many threads are generating requests. This however seems too much to ask for :-(
A simpler modification would be to set per-caller-thread limits. This way, a divide-and-conquer algorithm could, at each step, subdivide the total budget of threads. As an secondary upside, a budget of odd number of (2n+1) threads could be split to (n) threads for one sub-task and (n+1) threads for another, fully utilizing all threads, rather than setting a global budget of (n) threads per each (missing out on one) or (n+1) for each (oversubscribing).
Is such finer-grained control over thread limits possible? If so, I'd love to see support for it in threadpoolctl
.