Skip to content

Docker Image for playing with DeltaLake #922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stikkireddy
Copy link
Contributor

@stikkireddy stikkireddy commented Feb 3, 2022

This pull request is an attempt to resolve issue #919.

This PR adds instructions in the example directory for a dockerfile.

Instructions on what the image is trying to do:
The base is derived from jupyter/minimal-notebook and it installs openjdk-8 and r-lang.

It also installs deltalake 1.1.0, and uses jupytext to convert all the sample python notebooks into example ipynb files.

spark-defaults.conf is also set to introduce delta jars and delta configurations.

The intent is after running the container you have a playground to start using delta, with some starter notebooks.

Go to examples:

cd examples

To build:

docker build -t delta-lake-playground:latest .

To run:

docker run -it -p 8888:8888 delta-lake-playground

Notes: This image uses python 3.9.x. Those are the only images provided by jupyter that are supporting both intel + arm based builds.

Access the jupyterlab instance via http://localhost:8888/lab?token=docker

Signed-off-by: Sri Tikkireddy [email protected]

@stikkireddy stikkireddy changed the title Dockerfile for playing with DeltaLake Docker Image for playing with DeltaLake Feb 3, 2022

RUN pip install jupytext

COPY --chown=$NB_UID:$NB_UID python python/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure that these python examples work through jupyter?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update DELTA_LAKE_VERSION to 2.1.x? Thanks!


RUN mkdir -p notebooks

RUN find python/ -name "*.py" -exec basename {} .py ';' | \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please comment here what you are trying to do.

* Open Jupyter in your browser `http://localhost:8888/lab?token=docker`

#### Sample Notebooks
* There are sample python notebooks which are ported in the notebooks folder.
Copy link
Contributor

@tdas tdas Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tested them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also .. elaborate on how to run both scala and python notebooks. give precise instructions.

@@ -7,3 +7,23 @@ In this folder there are examples taken from the delta.io quickstart guide and d
### Instructions
* To run an example in Python run `spark-submit --packages io.delta:delta-core_2.12:{Delta Lake version} PATH/TO/EXAMPLE`
* To run the Scala examples, `cd examples/scala` and run `./build/sbt "runMain example.{Example class name}"` e.g. `./build/sbt "runMain example.Quickstart"`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the instruction at the top to make sure it is clear that there are two ways of running the examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run the scala examples inside docker as well?


### Docker Instructions

#### Building the image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not make sense to create thsi many single line sections. Please collapse them into "Running the docker image" and "Running examples"

@MrPowers
Copy link
Contributor

@stikkireddy - are you going to take a stab at addressing the comments on this PR or should someone else pick up this work?

@baumanab
Copy link

baumanab commented Sep 21, 2022

@MrPowers Have you looked at #1035?

@MrPowers MrPowers requested a review from dennyglee September 27, 2022 02:39
@dennyglee
Copy link
Contributor

In addition to @baumanab call out of #1035, I'm wondering if it would be helpful if we leveraged the proposed Spark docker per SPIP: Support Docker Official Image for Spark.

@baumanab
Copy link

baumanab commented Sep 28, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants