-
Notifications
You must be signed in to change notification settings - Fork 24
Emily's internship blog #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
EmilyXinyi
wants to merge
6
commits into
scikit-learn:main
Choose a base branch
from
EmilyXinyi:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
d632043
Emily's internship blog
EmilyXinyi 20aec6b
adding media
EmilyXinyi e0a255b
change video to be embedded tiktok link
EmilyXinyi 4130c7d
address comments
EmilyXinyi 82079cc
edited kubecon to past tense and added picture
EmilyXinyi 494c2a0
Merge branch 'scikit-learn:main' into main
EmilyXinyi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,66 @@ | ||||||
--- | ||||||
title: "Code, outreach, and beyond: my scikit-learn internship experience" | ||||||
date: August 15, 2024 | ||||||
|
||||||
categories: | ||||||
- Team | ||||||
tags: | ||||||
- Internship | ||||||
- Open Source | ||||||
|
||||||
|
||||||
postauthors: | ||||||
- name: Emily Chen | ||||||
email: [email protected] | ||||||
website: https://github.com/EmilyXinyi | ||||||
image: "emily_chen.jpeg" | ||||||
|
||||||
- name: François Goupil | ||||||
email: [email protected] | ||||||
website: https://github.com/francoisgoupil | ||||||
image: "francois_goupil.jpeg" | ||||||
--- | ||||||
<div> | ||||||
<img src="/assets/images/posts_images/{{ page.featured-image }}" alt=""> | ||||||
{% include postauthor.html %} | ||||||
</div> | ||||||
|
||||||
## Who am I? | ||||||
|
||||||
It’s Emily here! I am a Chinese-Canadian from Toronto, Canada, currently studying electrical and computer engineering at the University of Toronto. I recently completed a three-month internship in Paris, funded by [Probabl](https://probabl.ai/), focusing on the technical development of scikit-learn, as well as outreach to expand the scikit-learn community in China and beyond. | ||||||
|
||||||
|
||||||
## What I worked on | ||||||
|
||||||
There are two distinctly different components of my work, namely open source development and community outreach. | ||||||
|
||||||
### Open Source Developement | ||||||
|
||||||
I started my contributions by adapting certain metrics (tweedie, mean absolute percentage error etc.) to be Array API compatible under the guidance of my mentor, Olivier. The Array API standard is a cross-library API for array operations on Python, which is designed to improve interoperability and consistency across different array libraries. This also means that scikit-learn algorithms written in NumPy for CPU can work on other hardwares (GPU) with PyTorch or CuPy, greatly improving performance. As I gained more familiarity with the scikit-learn codebase and Array API, I began working on adapting “larger” functions to be Array API compatible, which means a lot more fundamental, a lot more dependencies, a lot more challenging, and a lot more fun. | ||||||
|
||||||
I also happened to be the only one on the team using a Mac with Intel chips, which means I was tasked with reproducing issues detected unique to this setup. Under the guidance of Loïc, I learned a lot about scikit-learn’s lockfiles, the CI pipeline, and identifying and fixing issues when they occur. | ||||||
|
||||||
### Chinese Community Outreach | ||||||
|
||||||
China has the second largest user group of scikit-learn. As a community, we believe that we can be more inclusive to ease Chinese contribution and do what is necessary to recruit more Chinese contributors. Therefore, I need to find out who and where scikit-learn is being used, if there are other platforms (outside of GitHub) that development is happening, because GitHub tends to be very slow in China, and establish scikit-learn’s official presence in the Chinese community. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
It was my first time being in this type of role, so it was lots of exploring, reaching out to representatives of the Chinese chapter at different companies/communities, translation work, and learning how the business side of Chinese social media works. After these three months, a network has been established between scikit-learn and various Chinese entities, including companies with open source projects, open source communities, and data science training and certification programs. I will be representing scikit-learn at [KubeCon + CloudNativeCon + AI_Dev](https://events.linuxfoundation.org/kubecon-cloudnativecon-open-source-summit-ai-dev-china/) in Hong Kong, where I will be meeting with some of the Chinese network in-person, establishing new connections, and looking for more collaboration opportunities. | ||||||
|
||||||
|
||||||
## Interacting with my mentors and co-workers | ||||||
|
||||||
The support from my mentors was invaluable. My mentor, Olivier, explained ongoing scikit-learn projects to me in extreme detail on my very first day, and walked me through my first PR from beginning to end on my third. Throughout my internship, Olivier was always available online, and provided thorough feedback on all of my PRs. Whenever I found a task that I would like to try but seemed somewhat challenging, Olivier was always supportive and provided me with advice whenever I needed. Because of this, I gained a better understanding of scikit-learn, Array API, and my technical skills have improved too. | ||||||
|
||||||
I also had weekly Peer Programming sessions with Loïc and Stefanie, where my piled-up questions from the week outside of Array API would be answered, and I would almost always learn something new about developer tools or programming fundamentals. | ||||||
|
||||||
On the Chinese community outreach side, it has always been with the scikit-learn communications team. Here I must give a special shoutout to manager François, who is also part of the communications team, for always being supportive and believing in my outreach efforts, especially because I was nervous doing this kind of task and using Chinese in a professional context for the first time. I also got to interact with [Charlie](https://charlie-xiao.github.io/) (yes, the core-dev Charlie), who is located in China and helped me tremendously with tasks that require physical presence. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
|
||||||
## My vision for scikit-learn in China in the future | ||||||
|
||||||
I am very optimistic about scikit-learn’s presence in China, and I am very excited to see where it leads to. The scikit-learn communications team and I are in the process of creating official (and verified!) accounts on Chinese social media platforms, which will establish scikit-learn’s online presence. As our network with Chinese entities becomes more mature, scikit-learn and its partners will jointly webinars online and eventually events in-person in China too. . | ||||||
|
||||||
|
||||||
## Special thanks | ||||||
|
||||||
My internship would not have been possible without Probabl, the official operating brand of scikit-learn. Probabl funded my internship, and the Probabl team made me feel so welcomed and comfortable in my position. Everyone is nice, open, and extremely supportive. I could honestly say that this is the best internship experience I have had, and I will be missing this team so much next year as I completed my final year of university. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.