Projects

Text Mining and Sentiment Analysis on Amazon Kindle ebook reviews data (July 2018- August 2018)

Analyzed about 3000 reviews on 300 ebooks on various Kindle ebooks to determine a good book for my next summer read.
Mined the review text to generate Term Document Matrices and wordclouds of words most frequently used in the reviews.
Chose the top 10 best rated books and performed sentiment analysis to narrow down on the one book with the most positive sentiment.

Project link:

Github link:

Predicting demand for an online classified advertisement

Avito (April 2018-June 2018)

Worked towards prediction of demand for an advertisement based mainly on the ad image and description.
Processed one million images using Python and mined the ad description for information using R.
Trained an XGBoost model with 1.3 million rows of test data and predicted the demand with an accuracy of 78%.
Ensembled the model with an Artificial Neural Network and Support Vector Machine to increase the accuracy.

Project link:

Github link:

Exploring the global burden of disease (January 2018 - March 2018)

Performed Exploratory Data Analysis (EDA) on global health data sets to identify trends in burden of disease on the global population.
Created visualizations and exploratory dashboards using Tableau to help understand the top diseases plaguing the world and other related statistics like min age and min life expectancy of a disease affected person.

Predicting credit card defaulters in Taiwan (October 2017 – December 2017)

Made use of the Cross Industry Standard Procedure for Data Mining (CRISP-DM) to model for delivering business insights about defaulters in Taiwan.
Predicted the likeliness of a client to default based on his/her demographics and previous credit card payments history.
Cleaned and prepared data, developed and modeled the data using Logistic Regression, Support Vector Machine, Random Forest, Decision Tree and Artificial Neural Networks.
Evaluated the models based on the Receiver Operating Characteristics (ROC) and chose the best model.