-
Data collection: Dataset collected from kaggle ,it contains train.txt,test.txt,val.txt three files.
-
Data handle: Pandas to read the text files ,seperated by semicolon and set the names of columns by Text,Labels.
-
Data cleaning: Regex used for match the patterns ,then text to lower,remove non alphabet character.NLTK toolkit to remove stopwords. cleaned text data stored in new column.
-
Encoding: Labels to be encoding using Label encoder it assign
0-Anger,1-Fear,2-Joy,3-Love,4-Sadness,5-Surprise
-
Training: Text Features extraction using Tfidfvectorizer. Using
logistic and Multinomial naive bayes
algorithms to train a model.This two algorithmns text data to handle well. -
Save: Stored trained model and tokens in the pickle file.
-
Tools: Python, JupyterNotebook, NLTK, Streamlit. Unicodes for emojis.
-
Result: Logistic regression achieved :
validation accuracy- 92.0% ,Test accuracy- 91.55%
feedbacks save in the csv file.