A short video demonstrating the working of the API
https://www.youtube.com/watch?v=QG-pj9tV81M
- Setup logging
- Getting the dataset and model from HuggingFace
- Build modules and pipelines (mentioned below in pipelines section) right from data gathering to model training and inferencing
- Use FAST API and Uvicorn to create training and inferencing endpoints.
- Deployment of the dockerized app to ECR and ECS on AWS
- Build CI/CD pipeline using GitHub Actions
- Postman for inferencing
- Data Ingestion
- Data Transformation
- Data Validation
- Model Training
- Model Evaluation
- Inferencing
Model: Hugging Face Google Pegasus
Citations:Jingqing Zhang, Yao Zhao, Mohammad Saleh, & Peter J. Liu. (2019). PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization.
Dataset: Hugging Face Samsum dataset
License: CC BY-NC-ND 4.0
Citations: Gliwa, B., Mochol, I., Biesek, M., & Wawer, A. (2019). SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization (pp. 70–79). Association for Computational Linguistics.