Description
I am trying to run the example code in a GCP DataProc spark cluster (1 master and 2 slaves). I've set my n_jobs = 1.
Here are my versions:
Python: 3.6
Joblib: 0.10.0
joblibspark: 0.14
Pyspark: 2.4.5
Command used for running: spark-submit <filename>.py
However, I get the following error.
/home/.local/lib/python3.6/site-packages/joblibspark/backend.py:94: UserWarning: limit n_jobs to be maxNumConcurrentTasks in spark: 0 warnings.warn("limit n_jobs to be maxNumConcurrentTasks in spark: " + str(n_jobs)) Traceback (most recent call last): File "<stdin>", line 12, in <module> File "/home/.local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 390, in cross_val_score error_score=error_score) File "/home/.local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 236, in cross_validate for train, test in cv.split(X, y, groups)) File "/home/.local/lib/python3.6/site-packages/joblib/parallel.py", line 960, in __call__ raise RuntimeError("%s has no active worker." % backend_name) RuntimeError: SparkDistributedBackend has no active worker.