top of page

Projects

Text Mining and Sentiment Analysis on Amazon Kindle ebook reviews data (July 2018- August 2018)

​

  • ​Analyzed about 3000 reviews on 300 ebooks on various Kindle ebooks to determine a good book for my next summer read.

  • Mined the review text to generate Term Document Matrices and wordclouds of words most frequently used in the reviews.

  • Chose the top 10 best rated books and performed sentiment analysis to narrow down on the one book with the most positive sentiment.

​

Project link:

https://klk744.wixsite.com/portfolio/blog/text-mining-and-sentiment-analysis-on-amazon-kindle-reviews-data

​

Github link:

https://github.com/kaivalya24/text-mining-sentiment-analysis

 

Predicting demand for an online classified advertisement

Avito (April 2018-June 2018)

​

  • ​Worked towards prediction of demand for an advertisement based mainly on the ad image and description.

  • Processed one million images using Python and mined the ad description for information using R.

  • Trained an XGBoost model with 1.3 million rows of test data and predicted the demand with an accuracy of 78%.

  • Ensembled the model with an Artificial Neural Network and Support Vector Machine to increase the accuracy. 

​

Project link:

https://klk744.wixsite.com/portfolio/blog/my-first-kaggle-competition-avito-demand-prediction

 

Github link:

https://github.com/kaivalya24/kaggle-avito-demand

​

Exploring the global burden of disease (January 2018 - March 2018)

​

  • Performed Exploratory Data Analysis (EDA) on global health data sets to identify trends in burden of disease on the global population.

  •  Created visualizations and exploratory dashboards using Tableau to help understand the top diseases plaguing the world and other related statistics like min age and min life expectancy of a disease affected person.

​

Project link: https://public.tableau.com/profile/kaivalya#!/vizhome/LakshmiKaivalya_globalburdenofdisease/Globalburdenofdisease  

 

Predicting credit card defaulters in Taiwan (October 2017 – December 2017)

​

  • ​Made use of the Cross Industry Standard Procedure for Data Mining (CRISP-DM) to model for delivering business insights about defaulters in Taiwan.

  •  Predicted the likeliness of a client to default based on his/her demographics and previous credit card payments history.

  • Cleaned and prepared data, developed and modeled the data using Logistic Regression, Support Vector Machine, Random Forest, Decision Tree and Artificial Neural Networks.

  • Evaluated the models based on the Receiver Operating Characteristics (ROC) and chose the best model. 

​

​

bottom of page