A New Semantic Approach on Yelp Review-star Rating Classification
Abstract
This paper introduces a new semantic approach for yelp review star rating prediction. Our approach extracts feature vectors from user reviews to develop star prediction models. User review text contains detailed information about reviewers’ experience, and directly reflects reviewer’s satisfaction level. Our approach can extract sentimental words from review text, and convert these information into different feature vectors. Reviewer’s personal preference may be extremely skewed from each other, to eliminate these effects, we use belief propagation methods to calculate review star probability distributions for different types of reviewers. Our machine learning algorithm predicts review star based on reviewers’ preference and voting habit. We extract different feature vectors and apply them to several machine learning algorithms. To evaluate all the 2.2 million user reviews, we build spark system on three laptops. To achieve a better prediction accuracy, we perform sentiment analysis of reviews in terms of the number of positive, negative, negation words, and apply belief propagation methods to get rid of personal preference effects. Our system can evaluate 2.2 million data entries in less than two minutes and achieve an accuracy of 55%.
Subject
semantic approach
review rating and classification
big data
machine learning