PYSPARK programming

**Must have experience in PYSPARK programming

***(2) Questions needed to be completed: Please include screenshots of the output, description of the results, and the program design coding

Don't use plagiarized sources. Get Your Custom Essay on
PYSPARK programming
Just from $13/Page
Order Essay

(1) There are fake comments created by the computers in the Amazon review system. Prof. Michael Luca from Harvard Business School argues 1 that there’s been some evidence that fake reviews are sloppier in general: ”Short, vague reviews are a pretty good marker, [along with] poor punctuation and grammar.”

Here are some examples of probably fake comments (e.g., ”GREAT”) and their corre- sponding ratings (e.g., 5 Star) in our data set:

   6^220^Five Stars^2016-01-09^false^ Quality product.^5.00
   6^221^Five Stars^2016-01-09^false^ Great quality.^5.00
   6^222^Five Stars^2015-11-25^false^ Excellent^5.00
   6^223^Five Stars^2016-01-14^false^ GREAT^5.00

It looks like that these fake reviews tend to be more common in the 5 star ratings than 1 star ratings. Let’s examine the average length (number of the words) of the comments for each rating and see if it really holds.

Please design and implement a PySpark programme to examine the average length of comments (column: ReviewContent) in each rating (column: ReviewRating). We have 5 levels of rating here where 1 star rating represents the worst experience and the 5 star rating represents the best experience. Hint: you can remove punctuation in each comment with the following code:

  import re
  re.sub(’\W+’, ’ ’, mystring).

’\W+’ is a regular expression that matches any non-alphanumeric characters.

What expected:

You should turn in an one python file which prints out the average length of the comments for each star rating:

 $ spark-submit 1-length.py
  1 star rating: average length of comments __
  2 star rating: average length of comments __
  3 star rating: average length of comments __
  4 star rating: average length of comments __
  5 star rating: average length of comments __

(2) Top words

Please design and implement a PySpark programme to pick up the top 10 words for each rating. Some words such as ”great”, ”good” are common in the 5 star rating comments, and others such as ”bad”, ”worst” are common in the 1 star rating comments.

Please remove the stop words such as ”the”, ”an”, ”of”, etc. in each comment before obtaining the results.

Your Python code should print out the top 10 common words for each star rating:

  $ spark-submit 2-wordranking.py
  top 10 common words
  1 star rating : __ __ __ ...
  2 star rating : __ __ __ ...
  3 star rating : __ __ __ ...
  4 star rating : __ __ __ ...
  5 star rating : __ __ __ ...

Requirements: 2 Questions

Essay Assign
Calculate your paper price
Pages (550 words)
Approximate price: -

Our Advantages

Plagiarism Free Papers

All our papers are original and written from scratch. We will email you a plagiarism report alongside your completed paper once done.

Free Revisions

All papers are submitted ahead of time. We do this to allow you time to point out any area you would need revision on, and help you for free.

Title-page

A title page preceeds all your paper content. Here, you put all your personal information and this we give out for free.

Bibliography

Without a reference/bibliography page, any academic paper is incomplete and doesnt qualify for grading. We also offer this for free.

Originality & Security

At Essay Assign, we take confidentiality seriously and all your personal information is stored safely and do not share it with third parties for any reasons whatsoever. Our work is original and we send plagiarism reports alongside every paper.

24/7 Customer Support

Our agents are online 24/7. Feel free to contact us through email or talk to our live agents.

Try it now!

Calculate the price of your order

We'll send you the first draft for approval by at
Total price:
$0.00

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

We work around the clock to see best customer experience.

Pricing

Flexible Pricing

Our prices are pocket friendly and you can do partial payments. When that is not enough, we have a free enquiry service.

Communication

Admission help & Client-Writer Contact

When you need to elaborate something further to your writer, we provide that button.

Deadlines

Paper Submission

We take deadlines seriously and our papers are submitted ahead of time. We are happy to assist you in case of any adjustments needed.

Reviews

Customer Feedback

Your feedback, good or bad is of great concern to us and we take it very seriously. We are, therefore, constantly adjusting our policies to ensure best customer/writer experience.