Open
Description
Hi,
Recently i've finished setting up a Spark cluster on couple of separate VMs.
When i was trying to perform SKlearn model training using Joblib-spark i've encountered following problem:
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote('sc://<master node ip>').appName("JoblibSparkBackend").getOrCreate()
register_spark()
param_distributions = {
"n_estimators": list(range(100, 500, 50)),
"max_depth": list(range(2, 7)),
}
model = RandomForestRegressor()
random_forest = RandomizedSearchCV(model,param_distributions,cv=5,refit=True)
with parallel_backend('spark', n_jobs=4):
random_forest.fit(X=X_train,y=y_train)
...
NotImplementedError: sparkContext() is not implemented.
Is there a workaround for this issue? Or is this something that will be implemented in a future ?
Metadata
Metadata
Assignees
Labels
No labels