-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Docker Image for playing with DeltaLake #922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
||
RUN pip install jupytext | ||
|
||
COPY --chown=$NB_UID:$NB_UID python python/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure that these python examples work through jupyter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update DELTA_LAKE_VERSION to 2.1.x? Thanks!
|
||
RUN mkdir -p notebooks | ||
|
||
RUN find python/ -name "*.py" -exec basename {} .py ';' | \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please comment here what you are trying to do.
* Open Jupyter in your browser `http://localhost:8888/lab?token=docker` | ||
|
||
#### Sample Notebooks | ||
* There are sample python notebooks which are ported in the notebooks folder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you tested them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also .. elaborate on how to run both scala and python notebooks. give precise instructions.
@@ -7,3 +7,23 @@ In this folder there are examples taken from the delta.io quickstart guide and d | |||
### Instructions | |||
* To run an example in Python run `spark-submit --packages io.delta:delta-core_2.12:{Delta Lake version} PATH/TO/EXAMPLE` | |||
* To run the Scala examples, `cd examples/scala` and run `./build/sbt "runMain example.{Example class name}"` e.g. `./build/sbt "runMain example.Quickstart"` | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the instruction at the top to make sure it is clear that there are two ways of running the examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run the scala examples inside docker as well?
|
||
### Docker Instructions | ||
|
||
#### Building the image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not make sense to create thsi many single line sections. Please collapse them into "Running the docker image" and "Running examples"
@stikkireddy - are you going to take a stab at addressing the comments on this PR or should someone else pick up this work? |
In addition to @baumanab call out of #1035, I'm wondering if it would be helpful if we leveraged the proposed Spark docker per SPIP: Support Docker Official Image for Spark. |
Yes, if its properties align with delta requirements. Misalignment has been a challenge that resulted in the custom dockerfiles submitted in #1035. If we can get official images that meet requirements, this will simplify things considerably.
… On Sep 27, 2022, at 9:28 AM, Denny Lee ***@***.***> wrote:
In addition to @baumanab call out of #1035, I'm wondering if it would be helpful if we leveraged the proposed Spark docker per SPIP: Support Docker Official Image for Spark.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
This pull request is an attempt to resolve issue #919.
This PR adds instructions in the example directory for a dockerfile.
Instructions on what the image is trying to do:
The base is derived from
jupyter/minimal-notebook
and it installs openjdk-8 and r-lang.It also installs deltalake 1.1.0, and uses jupytext to convert all the sample python notebooks into example ipynb files.
spark-defaults.conf is also set to introduce delta jars and delta configurations.
The intent is after running the container you have a playground to start using delta, with some starter notebooks.
Go to examples:
To build:
docker build -t delta-lake-playground:latest .
To run:
Notes: This image uses python 3.9.x. Those are the only images provided by jupyter that are supporting both intel + arm based builds.
Access the jupyterlab instance via
http://localhost:8888/lab?token=docker
Signed-off-by: Sri Tikkireddy [email protected]