Know your doctor: Topic modeling and sentiment analysis based approach to review doctor
- MSC MI 2016-2018
Nowadays people tend to search for doctors through business review websites, they naturally opt for those that have the very best ratings and an outsized variety of reviews that support those high ratings. Hundreds or perhaps thousands of reviews will be given to the best-rated ones beneath their profiles, and comparing a high rated option to every alternative becomes a tedious task. Furthermore, even if there is only one highly-rated doctor, one may still want to read the reviews to see why people like this doctor and if the reviewers addressed his or her concerns. This, again, could be time-consuming. In both cases, some sort of review summarizer would be helpful. Web services such as Zocdoc and Yelp have offered their own version of “doctor reviews” to help users quickly see what other reviewers have said about doctors. Zocdoc rates doctors based on three categories: “overall rating,” “bedside manner,” and “wait time”. However, this does not cover any other useful points that users made in their specific reviews. Yelp automatically highlights representative review sentences that share common phrases with other sentences, but no explicit rating is given for the topics mentioned in those sentences. This project aimed at building a tool would combine the best of both the above products. Know Your Doctor first detects the topics that have been discussed in the reviews (e.g. bedside manner). Then, it analyzes whether people were talking positively or negatively about those topics, and finally assigns appropriate ratings to the topics. This project aims to address this issue by making a summarizer to analyze the public data by performing topic modeling using Latent Dirichlet Allocation(LDA), a standard Natural Language Processing (NLP) technique. LDA is a tool which will determine topics from a corpus and word2vec based sentiment analysis which is the computational study of people's opinions, attitudes and emotions toward a review. Word2vec is a two-layer neural network that embeds the text corpus to a set of feature vectors of the words in the corpus. The reviews are taken from Yelp, an online rating website, of doctors across San Francisco. As a result of this study, a snapshot is created for each doctor which contain most dominant topics and their overall sentiment from their reviews.
LATENT DIRICHLET ALLOCATION NATURAL LANGUAGE PROCESSING WORD2VEC