**Must have experience in PYSPARK programming
***(2) Questions needed to be completed: Please include screenshots of the output, description of the results, and the program design coding
(1) There are fake comments created by the computers in the Amazon review system. Prof. Michael Luca from Harvard Business School argues 1 that there’s been some evidence that fake reviews are sloppier in general: ”Short, vague reviews are a pretty good marker, [along with] poor punctuation and grammar.”
Here are some examples of probably fake comments (e.g., ”GREAT”) and their corre- sponding ratings (e.g., 5 Star) in our data set:
6^220^Five Stars^2016-01-09^false^ Quality product.^5.00 6^221^Five Stars^2016-01-09^false^ Great quality.^5.00 6^222^Five Stars^2015-11-25^false^ Excellent^5.00 6^223^Five Stars^2016-01-14^false^ GREAT^5.00
It looks like that these fake reviews tend to be more common in the 5 star ratings than 1 star ratings. Let’s examine the average length (number of the words) of the comments for each rating and see if it really holds.
Please design and implement a PySpark programme to examine the average length of comments (column: ReviewContent) in each rating (column: ReviewRating). We have 5 levels of rating here where 1 star rating represents the worst experience and the 5 star rating represents the best experience. Hint: you can remove punctuation in each comment with the following code:
import re re.sub(’\W+’, ’ ’, mystring).
’\W+’ is a regular expression that matches any non-alphanumeric characters.
You should turn in an one python file which prints out the average length of the comments for each star rating:
$ spark-submit 1-length.py
1 star rating: average length of comments __ 2 star rating: average length of comments __ 3 star rating: average length of comments __ 4 star rating: average length of comments __ 5 star rating: average length of comments __
(2) Top words
Please design and implement a PySpark programme to pick up the top 10 words for each rating. Some words such as ”great”, ”good” are common in the 5 star rating comments, and others such as ”bad”, ”worst” are common in the 1 star rating comments.
Please remove the stop words such as ”the”, ”an”, ”of”, etc. in each comment before obtaining the results.
Your Python code should print out the top 10 common words for each star rating:
$ spark-submit 2-wordranking.py
top 10 common words 1 star rating : __ __ __ ... 2 star rating : __ __ __ ... 3 star rating : __ __ __ ... 4 star rating : __ __ __ ... 5 star rating : __ __ __ ...
Requirements: 2 Questions
Plagiarism Free Papers
All our papers are original and written from scratch. We will email you a plagiarism report alongside your completed paper once done.
All papers are submitted ahead of time. We do this to allow you time to point out any area you would need revision on, and help you for free.
A title page preceeds all your paper content. Here, you put all your personal information and this we give out for free.
Without a reference/bibliography page, any academic paper is incomplete and doesnt qualify for grading. We also offer this for free.
Originality & Security
At Essay Assign, we take confidentiality seriously and all your personal information is stored safely and do not share it with third parties for any reasons whatsoever. Our work is original and we send plagiarism reports alongside every paper.
24/7 Customer Support
Our agents are online 24/7. Feel free to contact us through email or talk to our live agents.
Try it now!
How it works?
Follow these simple steps to get your paper done
Place your order
Fill in the order form and provide all details of your assignment.
Proceed with the payment
Choose the payment system that suits you most.
Receive the final file
Once your paper is ready, we will email it to you.
We work around the clock to see best customer experience.
Our prices are pocket friendly and you can do partial payments. When that is not enough, we have a free enquiry service.
Admission help & Client-Writer Contact
When you need to elaborate something further to your writer, we provide that button.
We take deadlines seriously and our papers are submitted ahead of time. We are happy to assist you in case of any adjustments needed.
Your feedback, good or bad is of great concern to us and we take it very seriously. We are, therefore, constantly adjusting our policies to ensure best customer/writer experience.