Data Scientist passionate about uncovering hidden patterns in data. Adept at building and deploying machine learning and deep learning models, and data pipelines for real-world applications. Proficient in Python for data manipulation and analysis. Eager to leverage data science to solve challenging problems.
Developed a multi-utility application powered by LLMs and Langchain framework primarily
- Question Answering: Built an interface for users to ask general questions and receive answers. Users can choose between various LLMs like gpt-3.5-turbo, llama3-8b-instruct , gemma-7b-it and Mistral-7B-Instruct-v0.2 using OpenAI API, Ollama, Groq API and HuggingFace respectively, for answering with Langchain's tools for building the pipeline.
- Website Search: Created an interface to search websites like Wikipedia, Langsmith, and Arxiv by posing questions. Specialized Langchain agents and tools handle information lookup and context generation for each website, leveraging LLM power for delivering responses.
- RAG App: Created a RAG chat app by combining document parsers, text splitter and a vector store and prompt into a chain, where the user can upload documents and chat with them.
π Head over to the repo to read about this project in detail
Built a Text-summarization API using HuggingFace transformer (Google Pegasus), train it on Samsum data from HuggingFace, build a training and inference pipeline using FAST API and deployed to AWS with CI/CD Pipeline.
πΉ Watch a demonstration video: here
π Visit the repo: here
Trained an image classification model (CNN) using Tensorflow from scratch and used pre-trained models and fine-tuned them for the required use case. Used Optuna to hyperparameter tune the models and select the best performing one to infer on the test dataset.
π Visit the repo: here
Developed a machine learning model to predict customer churn. Utilized various classification algorithms including Logistic Regression, KNN, SVM, Decision Tree, Random Forest, XGBoost, LightGBM, AdaBoost, CatBoost and Stacking Ensemble, achieving 91.6% accuracy and 0.90 precision in identifying at-risk customers.
π Visit the repo: here
Built a delivery time prediction model for Porter using regression techniques. Data preprocessing included handling missing values and outliers, along with feature engineering and standardization. Experimented with various models like Linear Regression, Decision Tree, XGBoost, AdaBoost, CatBoost, LightGBM, Random Forest and Neural Networks. LightGBM Regressor achieved the best performance with a minimum mean squared error of 0.653.
π Visit the repo: here
Languages: Python, SQL
Concepts: Data Analysis, Probability and Statistics, Machine Learning, Deep Learning, Unsupervised learning, Feature Engineering, MLOps
Tools and softwares: Tableau, Postman, Docker, Git
Libraries, utilities and frameworks: Numpy, Pandas, Scikit-Learn, Matplotlib, Seaborn, Tensorflow, Keras, Pyspark, Snowflake, MongoDB, ChromaDB
- π¨βπ» All of my projects are available in the Repositories
- π« Reach me at [email protected]
- π Vist my Medium blog here