Need of Sentiments Analysis with CF for Quality Recommendations

— Recommendation system (RS) help user for purchasing the right product of their interest within the affordable right price. Presently many RS make use of only filtering methods to recommend products to the user which is not taking care of the quality of products. Quality of products can be found from textual reviews available on various e-commerce websites and hence this RS performs Sentiment Analysis (SA)of extracted relevant textual reviews along with Collaborative Filtering (CF) to give accurate and good quality recommendations to the user. Reviews are analyzed using optimized Artificial Neural Network (ANN) which shows notified improvement than traditional ANN on real-time extracted data of reviews.CF performance is proved by using the standard dataset of movilense used in many research papers. Results show high recall and accuracy of CF for the recommendation of products to the target user.


I. INTRODUCTION
Every user's browsing on the internet has reached as mammoth growth in e-commerce, social media, and online review sites especially in online shopping's [1], [2]. People often rely on websites to know about the review for finalizing their purchase decisions, market predictions, and so on [3]. Based on people's reviews the future result can be predicted in various fields such as politics, books, etc. To seek other's opinions to judge the product advantage is very important and for this Opinion Mining is used for extracting essential information from user comments or opinions in social networks regarding specific things or products. It is also known as Sentiment Analysis (SA), it also helps for various practical applications such as election result predictions, product's benefits with prices, Share market profits etc. [1], [4], [5].
The aggregation of data mining and opinion has become a serious research topic to analyze user ratings and reviews before recommendation. For the Recommendation system (RS) task of mining user's interested products is done by filtering methods. Among the four filtering methods of RS, Collaborative Filtering (CF) is used by many authors due to its advantages over other filtering methods [6]. CF recommends products to target the user by studying his purchase history and finding the most similar users who purchased similar products like him. Existing similarity measures used by CF to calculate similar users have some drawbacks like low or zero similarity between users even same purchase history, no predictions for single product purchased users. The similarity is calculated based on ratings on the scale of 1 to 5 given by users to their purchased products, these ratings will not open all user's sentiments about products. Sentiments can focus on feature wise qualities of the product which the target user may find important before purchasing the product.
This RS presents a quality-aware prediction of products to the target user.CF gives a new similarity measure to predict an accurate list of products for all users along with new users. The quality of these products is confirmed before final recommendation using sentiments of reviews extracted from online search engines like Google, Yahoo, Bing, etc. Sentiments are analyzed with an optimized Artificial Neural Network (ANN). The next section contains related work for RS, section 3 and 4 explains the proposed methodology and algorithm followed by results.

II. RELATED WORK
Different papers on collaborative filtering and Review sentiments analysis are surveyed for understanding issues of RS A. Collaborative Filtering [7] uses traditional memory-based collaborative filtering algorithm to recommend books. It studies student's learning trajectories and combines with a time-sequential cf algorithm. The algorithm takes two inputs as time sequence information of book and circulation time of book, then it calculates the distance.
Author [8] propose a ratio-based method for service recommendation. This paper uses memory-based collaborative filtering. It computes similarity by using the new RACF. The result shows the comparison of RACF and all other similarities like PCC, NRCF, UPCC, IPCC, etc. Using this similarity prediction is done about service recommendation.
Author [9] proposed item-based CF. It calculates similarity by using cosine similarity. Cosine similarity considered all user for every product pair.
Authors Xiaokun Wu [10], Haifeng Liu [11], J. Bobaldia [12] suggested new similarity measures to calculate the similarity between two users in their respective papers. All have shown improvements as compared to existing similarity

B. Review of Sentiments Analysis
The growth of the neural network in various application domains becomes a successful method for classification problems. G. Vinodhini and her team from Annamalai University used a new method based on the combined feature of backpropagation, homogeneous and probabilistic neural network for sentiment classification. The author proved good performance of ensemble algorithm [13].
Author [14], have developed a method using noise labels. They proposed their work based on emotion classification would be valuable for learning user opinion effectively. They decided to improve their model for dealing with continuous noise. They collected information of annotators to choose correct pairs. Therefore, identifying of accurate opinions is shown [15] given review analysis of Chinese reviews for recommendations.
Author [16], has proposed an analysis process to analyze negative sentiments called OSAPS. It retrieves customer dissatisfaction tweets regarding services of US and UK Post. A combination of technologies for Twitter extraction, data cleaning, subjective analysis, ontology model building, and sentiment analysis are used. This analysis helped the Post to correct their performance.
The literature survey has shown that item based method of CF having pitfall about required memory and response time. In reality, very limited users are having common product pairs and many users are having less purchases, hence userbased CF is efficient than item-based CF. Collaborative filtering similarity measure also requires to be advanced to get accurate predictions. Some papers used only review analysis for recommendations and result in quality products, but which are not of user liking. A combination of CF and Review sentiments analysis is required to preserve and explore user interest along with quality products.

III. PROPOSED METHOD
The proposed system is shown in Fig. 1. RS with CF needs utility matrix as input which has ratings given by different users to their purchased products. This RS uses a new similarity measure to predict the product list for the target user. Quality of Products in the predicted list is confirmed using textual review analysis using optimal ANN as shown in section B in Fig. 1. This RS thus gives dual confirmation of relevant and quality products for the recommendation.

A. CF with a New Similarity Measure
This RS finds out the similarity between two different users using the new similarity measure given in (1): Standard Deviation of maximum ratings ( , ) maximum rating among both the users Numerator gives deviation of maximum rating among both the users for the common products with the span of the mean of ratings of both the users. Denominator considers only the maximum rating for the common product between user pair (1) can be further elaborated as (2): where u a ,u b -two different users, X-total number of common products between two users, , ab u x u x rr -user ua 's and ub's ratings to common product x and , ab uu rr-average for users ua and b of those products which are common between them. Using this similarity list of products is predicted for the target user. Here average in standard deviation is subtracted from the highest rating 5 to avoid the drawback of existing similarity measures. This modification calculates proper similarity for those users who are not getting similar users in existing similarity measures and hence no user will be without a predicted list of products.

B. Quality Confirmation of Products Using Textual
Reviews Sentiments Analysis CF uses higher similarity between users for prediction of products, but with lack of quality of products. Quality can be confirmed by analyzing textual reviews given by different users about these products. Users mention good and bad experiences about the different features of their purchased products. This RS extracts reviews given by different users from online search engines Google, Yahoo, and Bing for a predicted list of products. These reviews of every product in the predicted list are classified as positive and negative based on different features and a predictive score of every product is calculated. On this prediction score, products in the predicted list of CF are ranked and top position quality products are recommended finally. Review analysis is done as mentioned in the following steps: 1. Take input keyword from a predicted list of products.
2. Search and extract reviews. Find title, and contents by finding meaning and similar words.
3. Preprocessing of reviews. 4. Combine title, content, and trust matrix of all reviews for different features of every product in the predicted list.
5. Give these combined values of every review as input to the optimized ANN as shown in Fig. 2. A number of neurons in the input layer is directly proportional to the different number of features. 6. Create all neurons and their connections for input, hidden, and output layers as shown in Fig. 3. 7. Initialize random weights for input layer neurons between -1 to 1.
8. Calculate the output of the neuron based on the input given in (3).
where i,j= 1 to n, a different number of inputs for neural network. 9. All output propagates back for getting expected Output. First, calculate the partial derivative of the error concerning each of the weights leading to the output neurons. Bias is also updated here. 10. Train neural network until minimum error reached or maxSteps exceeded. This will be done by updating the weights of hidden and output layers to reach the expected output. To get the expected output we have used the sigmoid activation function given in (4).
11. For test input, use final weights which are got from the training phase and calculate review score for every product in the predicted list.
Then based on positive and negative review scores, predicted products in the list according to quality are recommended to the user.

IV. ALGORITHM
The proposed system works in three main steps. CF works on the belief that if two users have liked the same products in their past then they are having similar likings.CF uses a new similarity measure to calculate most similar users for the target user and predict the product list for him. The next step extracts reviews from online search engines and calculates a predictive score of every product in the predicted list of CF. The predicted list is ranked based on predictive score and top-N products are recommended in the final step.
For every product in the predicted list, the review analysis step extracts reviews of the same product from online search engines Google, Yahoo, and Bing. Title, content, and trust values of every review are combined and given as input to the optimized ANN. Optimized ANN gives a predictive score of every product based on positive and negative sentiments of review. Negative sentiments help to understand irrelevant features of the product and these products are not recommended to the target user. The predictive score of the product is proportional to the negative sentiments of products. Products are ranked based on predictive score which assures good quality of the product. The steps of the Algorithm are as shown in Fig. 4.
The above three steps are elaborated further.
Step 1: Predicted list of products.
Here, CF calculates the most similar user for the target user using eqn.2 and predicts the list of products for him.
Step 2: Analysis of Reviews. For every product in the predicted list, the review analysis step extracts reviews of the same product from online search engines Google, Yahoo, and Bing. Title, content, and trust values of every review are combined and given as input to the optimized ANN. Optimized ANN gives a predictive score of every product based on positive and negative sentiments of review. Negative sentiments help to understand irrelevant features of the product and these products are not recommended to the target user. A predictive score of the product is proportional to negative sentiments of products. Products are ranked based on predictive score which assures good quality of the product.

V. EXPERIMENTATION
The dataset of MovieLense (https://grouplens.org/datasets/movielens/100k/) is used to prove the accuracy of the new similarity measure. Dataset is originally divided into five different train and test sets, so we used k fold cross-validation to get the correct output.

A. Evaluation Metrics
The performance of CF is proved by using precision and recall metrics. To reduce the volume of 1 lac users and to make CF simple, clusters of similar users can be created using user profile information. Precision and recall should attain a high value for good performance. Recall gives correctness of recommendation. One more evaluation criteria called Fmeasure is used which gives an accuracy of recommendation.

B. Performance Analysis of Proposed Similarity Measure
As the number of recommended products and the number of nearest neighbors is increased, precision and recall metrics will also change. All existing similarity measures are compared with the new proposed measure based on precision, recall, and accuracy. The new proposed similarity measures outperform traditional similarity measures as shown in Fig. 5. Recall and F-measure is directly proportional to the number of products in top-N for this RS. From Fig. 5, it can be seen that Precision is better than other measures. When we considered top-10 products, the proposed measure has the best recall values as compared to others as shown in Fig. 6. The new similarity measure is giving good accuracy as shown in Fig. 7.

C. Performance Analysis of Optimized ANN
Here the proposed optimized ANN is compared with traditional ANN. From Fig. 8, it can be seen that the optimized ANN method works efficiently when reviews are considered for top-10 and more products.

VI. CONCLUSION
Collaborative filtering calculates the correct similarity between two users and hence can predict a list of products for the target user.CF works on the ratings given by user to their purchased products, so cannot predict the quality of products. Hence this RS used CF and review analysis in combination to get relevant and quality recommendations. Ratings and reviews of all users are not opened by e-commerce websites, therefore textual review analysis which extracts all reviews from online search engines gives quality results and helps the user to get relevant high-quality products. Experimentations proved that the proposed new similarity measure is better than existing measures and optimized ANN for textual review analysis attained good performance. In the future, this RS can be made scalable by using optimized clustering to reduce the volume of users for CF. Seema P. Nehete is a research scholar and working as a faculty in reputed engineering college in Mumbai, India for 15 years. Author has completed her master's degree in computer engineering from Vidyalankar institute of technology, Mumbai, India.
Dr. Satish R. Devane is Professor in reputed engineering college in Mumb. Dr. Satish R. Devane is Professor in reputed engineering college in Mumbai, India and having experience of 34 years. He has completed his Ph.D. from Indian Institute of Technology Bombay (IITB). He is working on advisory boards of many engineering colleges in India. Author is active member of many technical forums.
Author's formal photo