Description
As far as I can tell, limiting the number of threads in TensorFlow with threadpoolctl currently doesn't work.
For instance with the following minimal example with Tensorflow 2.5.0,
example.py
import tensorflow as tf
import numpy as np
from threadpoolctl import threadpool_limits
with threadpool_limits(limits=1):
X = tf.constant(np.arange(0, 5000**2, dtype=np.int32), shape=(5000, 5000))
tf.matmul(X, X)
running,
time python example.py
on a 64 cores CPU, produces,
real 0m3.781s
user 1m8.685s
so the user (CPU) time is still >> real run time, meaning that many CPU are used.
This becomes an issue if people run scikit-learn's GridSearchCV
or cross_validate
on a Keras or TensorFlow model, since it then results in CPU over-subscription. I'm surprised there are no more issues about it at scikit-learn.
Tensorflow also regrettably doesn't recognize any environment variables to limit the number of CPU cores either. The only way I found around it is to set the CPU affinity mask with taskset
. But then again it wouldn't help for cross-validation for instance, since joblib would then need to set the affinity mask when creating new processes which is currently not supported.
Has anyone looked into this in the past by any chance?