Welcome to this example repository to get started creating combined ML and data pipelines with Apache Airflow! 🚀
This repository contains a fully functional data and ML orchestration pipeline that can be run locally with the Astro CLI.
This Airflow pipeline will perform model fine-tuning and testing using S3, DuckDB and HuggingFace.
Use this repository to explore Airflow, experiment with your own DAGs and as a template for your own projects, as well as your own custom operators and task groups!
This project was created with ❤️ by Astronomer.
If you are looking for an entry level written tutorial where you build your own first Airflow DAG from scratch check out: Get started with Apache Airflow, Part 1: Write and run your first DAG.
Download the Astro CLI to run Airflow locally in Docker. astro
is the only package you will need to install.
If you are on a Mac, install the Astro CLI is as easy as the following steps:
- Check that you have Docker Desktop and Homebrew installed.
- Run
brew install astro
.
And that is it, you can now use the repository:
- Run
git clone https://github.com/TJaniF/airflow-ml-pipeline-image-classification.git
on your computer to create a local clone of this repository. - Run
astro dev start
in your cloned repository. - After your Astro project has started. View the Airflow UI at
localhost:8080
. - Set your own variables pointing at your source data in
include/config_variables.py
. - You likely will have to make some changes to the
standard_transform_function
depending on your source data, you can find it and other functions used ininclude/utils/utils.py
.