Kaivalya Kandukuri
Aspiring Data Analyst/Scientist
Projects
Text Mining and Sentiment Analysis on Amazon Kindle ebook reviews data (July 2018- August 2018)
​
-
​Analyzed about 3000 reviews on 300 ebooks on various Kindle ebooks to determine a good book for my next summer read.
-
Mined the review text to generate Term Document Matrices and wordclouds of words most frequently used in the reviews.
-
Chose the top 10 best rated books and performed sentiment analysis to narrow down on the one book with the most positive sentiment.
​
Project link:
​
Github link:
https://github.com/kaivalya24/text-mining-sentiment-analysis
Predicting demand for an online classified advertisement
Avito (April 2018-June 2018)
​
-
​Worked towards prediction of demand for an advertisement based mainly on the ad image and description.
-
Processed one million images using Python and mined the ad description for information using R.
-
Trained an XGBoost model with 1.3 million rows of test data and predicted the demand with an accuracy of 78%.
-
Ensembled the model with an Artificial Neural Network and Support Vector Machine to increase the accuracy.
​
Project link:
https://klk744.wixsite.com/portfolio/blog/my-first-kaggle-competition-avito-demand-prediction
Github link:
https://github.com/kaivalya24/kaggle-avito-demand
​
Exploring the global burden of disease (January 2018 - March 2018)
​
-
Performed Exploratory Data Analysis (EDA) on global health data sets to identify trends in burden of disease on the global population.
-
Created visualizations and exploratory dashboards using Tableau to help understand the top diseases plaguing the world and other related statistics like min age and min life expectancy of a disease affected person.
​
Project link: https://public.tableau.com/profile/kaivalya#!/vizhome/LakshmiKaivalya_globalburdenofdisease/Globalburdenofdisease
Predicting credit card defaulters in Taiwan (October 2017 – December 2017)
​
-
​Made use of the Cross Industry Standard Procedure for Data Mining (CRISP-DM) to model for delivering business insights about defaulters in Taiwan.
-
Predicted the likeliness of a client to default based on his/her demographics and previous credit card payments history.
-
Cleaned and prepared data, developed and modeled the data using Logistic Regression, Support Vector Machine, Random Forest, Decision Tree and Artificial Neural Networks.
-
Evaluated the models based on the Receiver Operating Characteristics (ROC) and chose the best model.
​
​