Project-XGBoost-MLOps

Capstone project for MLOps Zoomcamp course.

Problem Description

Losing customers is costly for any business. Therefore, identifying unhappy customers early on enables a company to offer them incentives to stay. Machine learning (ML) can be used for the automated identification of unhappy customers. This is also known as customer churn prediction.

In this case, the company is a mobile phone operator and the data to be used is a relatively small dataset (< 5000 rows) and has 21 attributes:

State: the US state in which the customer resides, indicated by a two-letter abbreviation; for example, OH or NJ
Account Length: the number of days that this account has been active
Area Code: the three-digit area code of the corresponding customer’s phone number
Phone: the remaining seven-digit phone number
Int’l Plan: whether the customer has an international calling plan: yes/no
VMail Plan: whether the customer has a voice mail feature: yes/no
VMail Message: the average number of voice mail messages per month
Day Mins: the total number of calling minutes used during the day
Day Calls: the total number of calls placed during the day
Day Charge: the billed cost of daytime calls
Eve Mins, Eve Calls, Eve Charge: the billed cost for calls placed during the evening
Night Mins, Night Calls, Night Charge: the billed cost for calls placed during nighttime
Intl Mins, Intl Calls, Intl Charge: the billed cost for international calls
CustServ Calls: the number of calls placed to Customer Service
Churn?: whether the customer left the service: true/false

The dataset we use is publicly available and was mentioned in the book Discovering Knowledge in Data by Daniel T. Larose. It is attributed by the author to the University of California Irvine Repository of Machine Learning Datasets. A copy of the dataset churn.txt can be found here.

I will re-use the data analysis performed by Denis V. Batalov, a Solutions Architect for AWS to determine which columns to drop, as the focus of this project is on the creation of the MLOps pipelineand not the data engineering required for a ML model. The analysis is decrived in this blog: AWS Machine Learning Blog Post.

I will use the binary classification XGBoost algorithm to determine whether a customer is likely to churn.

Architecture

Project Deliverables

Area	Criteria	Reference
Problem Description	The problem is well described and it's clear what the problem the project solves	This document
Cloud	The project is developed on the cloud and IaC tools are used for provisioning the infrastructure	Cloud readme
Experiment tracking and model registry	Both experiment tracking and model registry are used	MLFlow readme
Workflow orchestration	Fully deployed workflow	Prefect readme
Model deployment	The model deployment code is containerized and could be deployed to cloud or special tools for model deployment are used	Model Deployment readme
Model monitoring	Comprehensive model monitoring that sends alerts or runs a conditional workflow (e.g. retraining, generating debugging dashboard, switching to a different model) if the defined metrics threshold is violated	Model Monitoring Readme
Reproducibility	Instructions are clear, it's easy to run the code, and it works. The versions for all the dependencies are specified.	Reporoducibility readme
Best practices	There are unit tests (1 point) There is an integration test (1 point) Linter and/or code formatter are used (1 point) There's a Makefile (1 point) There are pre-commit hooks (1 point) There's a CI/CD pipeline (2 points)	Best Practices readme

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
data		data
deployment		deployment
images		images
mlflow		mlflow
models		models
monitoring		monitoring
prefect		prefect
setup		setup
tests		tests
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
best practices readme.md		best practices readme.md
cloud readme.md		cloud readme.md
mlflow readme.md		mlflow readme.md
model deployment.md		model deployment.md
monitoring readme.md		monitoring readme.md
prefect readme.md		prefect readme.md
reporoducibility readme.md		reporoducibility readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project-XGBoost-MLOps

Problem Description

Architecture

Project Deliverables

About

Uh oh!

Releases

Packages

Uh oh!

Languages

BuzzKanga/Project-XGBoost-MLOps

Folders and files

Latest commit

History

Repository files navigation

Project-XGBoost-MLOps

Problem Description

Architecture

Project Deliverables

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages