Is there a plan to implement Spark Connect to Joblib-spark?

Hi, 
Recently i've finished setting up a Spark cluster on couple of separate VMs.

When i was trying to perform SKlearn model training using Joblib-spark i've encountered following problem:

```
from pyspark.sql import SparkSession

spark = SparkSession.builder.remote('sc://<master node ip>').appName("JoblibSparkBackend").getOrCreate()
register_spark()

param_distributions = {
    "n_estimators": list(range(100, 500, 50)),
    "max_depth": list(range(2, 7)),
}

model = RandomForestRegressor()
random_forest = RandomizedSearchCV(model,param_distributions,cv=5,refit=True)

with parallel_backend('spark', n_jobs=4):
    random_forest.fit(X=X_train,y=y_train)
```


```
...
NotImplementedError: sparkContext() is not implemented.
```

Is there a workaround for this issue? Or is this something that will be implemented in a future ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there a plan to implement Spark Connect to Joblib-spark? #50

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is there a plan to implement Spark Connect to Joblib-spark? #50

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions