🎲 Your Best Bet

A full-stack MLOps demo that predicts European football match outcomes using Python, dbt, and Databricks—combining data engineering, feature pipelines, and machine learning to find value bets and beat the bookies.

About The Project

Imagine a scenario of daily (or weekly) sports betting where you're on a quest to outsmart the bookies. This project houses a data warehouse simulation housing the European Soccer Database date. Utilizing team and player statistics, performance metrics, FIFA stats, and bookie odds, we'll hunt down opportunities and place value bets.

Within the pipeline, you can:

Version Your Dataset: run preprocessing to (re)generate your ML dataset
Experiment & Store: run and save ML experiments
Model Management: save and compare models
Reproducibility: ensure inference pipelines run without train/serving skew (run simulations)
Feature Store: house all input features with the available KPIs at that time
Prediction Audit: maintain a log of all predictions

Getting Started

Prerequisites

Python (>=3.11, tested until 3.12)
Access to a Databricks cluster (e.g., Azure free account)

Installation (Databricks)

Manual setup:

install virtual environment

virtualenv .venv
source venv/bin/activate
pip install -r requirements.txt

# optional for contribution
pip install -r requirements-dev.txt

Download data from here -> you need a Kaggle account. Drop the resulting database.sqlite file in the data folder.
Convert data to parquet and csv files
```
python scripts/convert_data.py
```
Databricks
1. Create a SQL warehouse
2. Create a personal access token
3. Upload data (parquet files) to your schema of choice
4. Create a compute cluster
Setup environment / secrets, fill in the template in env.templ (rename as .env) and set the env vars (eg. set -a && source .env)
dbt setup
1. initialise and install dependencies: dbt deps
2. setup your dbt profile, normally env vars have set it correctly so nothing needed here (dbt_your_best_bet/profiles/profiles.yml)
install riskrover python package, managed with poetry, on your compute
1. build the package: cd riskrover && poetry build
2. Install the resulting whl file (riskrover/dist/riskrover-x.y.z-py3-none-any.whl) on your databricks compute cluster: Compute-scoped libraries

You should now be able to run the pipeline without any trained models, eg. just the preprocessing:

dbt build --selector gold

⚠️ NOTE

The setup described above is manual and intended for demonstration purposes. For production deployments, consider the following best practices:

Infrastructure as Code: Use tools like Terraform to provision Databricks clusters, manage accounts, networking, and other resources.
Containerization & Orchestration: Containerize your dbt environment (e.g., with Docker) and orchestrate workflows using tools like Apache Airflow.
Package and publish riskrover to a private code repository, install dynamically on cluster within your CI/CD pipeline
Secrets & Environment Management: Manage secrets and environment variables securely using services such as Databricks Secrets or Azure Key Vault.
CI/CD: Implement continuous integration and deployment pipelines for automated testing and deployment, with Github Actions for example.

(back to top)

Usage

Explore and see how we could've made profit back in 2016 if we had access to this data at the right time :D.

MWE simulation

The default variables are stored in dbt_project.yaml. We find ourselves on 2016-01-01 in our simulation, with the option to run until 2016-05-25.

cd dbt_your_best_bet

# Preprocessing
dbt build --selector gold

# Experimentation (by default -> training set to 2015-07-31, and trains a simple logistic regression with cross validation)
dbt build --selector ml_experiment

# Inference on test set (2015-08-01 -> 2015-12-31)
dbt build --selector ml_predict_run

# moving forward in time, for example with a weekly run
dbt build --vars '{"run_date": "2016-01-08"}'
dbt build --vars '{"run_date": "2016-01-15"}'
dbt build --vars '{"run_date": "2016-01-22"}'
...

# check if you made any money by compiling and running an analysis file
dbt compile -s analyses/compare_model_profit.sql

Deep dive in Risk Rover

Analysis is available in riskrover/notebooks/riskrover.ipynb.

Checking the data catalog

dbt docs generate
dbt docs serve

We can check the lineage:

### Cleanup

drop table if exists snapshots.predict_input_history;
drop table if exists snapshots.experiment_history;

then rebuild with --full-refresh

Roadmap

Mostly maintenance, no plans on new features unless requested.

Extra documentation
Data tests and unit tests
Extra sql analysis

License

Distributed under the MIT License.

Contact

[email protected] / [email protected]

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎲 Your Best Bet

About The Project

Getting Started

Prerequisites

Installation (Databricks)

Usage

MWE simulation

Deep dive in Risk Rover

Checking the data catalog

Roadmap

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
dbt_your_best_bet		dbt_your_best_bet
images		images
riskrover		riskrover
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
env.templ		env.templ
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

datarootsio/your-best-bet

Folders and files

Latest commit

History

Repository files navigation

🎲 Your Best Bet

About The Project

Getting Started

Prerequisites

Installation (Databricks)

Usage

MWE simulation

Deep dive in Risk Rover

Checking the data catalog

Roadmap

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages