xavier-assignment

Natural Language Processing

Q1 Review the python script in Q1 Folder – NLTK_Text_Analysis.py

Don't use plagiarized sources. Get Your Custom Essay on
xavier-assignment
Just from $13/Page
Order Essay

Use text below to apply the same process

Text= “””Backgammon is one of the oldest known board games. Its history can be traced back nearly 5,000 years to archeological discoveries in the Middle East. It is a two player game where each player has fifteen checkers which move between twenty-four points according to the roll of two dice.”””

a. Text Analysis Operations using NLTK

b. Tokenization

c. Stopwords removal

d. Lexicon Normalization such as Stemming and Lemmatization

e. POS Tagging

Q2 Analyze the customer reviews in the file Restaurant_Reviews.tsv

Explain each step for the following text clean-up commands

a. Explain each step for the following text clean-up commands

review = dataset[‘Review’][0]

review = re.sub(‘[^a-zA-Z]’, ‘ ‘, dataset[‘Review’][0])

review = review.lower()

review = review.split()

ps = PorterStemmer()

review = [ps.stem(word) for word in review if not word in set(stopwords.words(‘english’))]

review = ‘ ‘.join(review)

b. What is the classification question?

c. The example uses the Naïve Bayes classifier to classify the sentiments. Calculate the confusion matrix:

TP = # True Positives,

TN = # True Negatives,

FP = # False Positives,

FN = # False Negatives):

Accuracy = (TP + TN) / (TP + TN + FP + FN)

d. Apply the logistic regression classifier to the problem – recalculate “c” i.e. TP, TN, FP, FN, Accuracy

Q3 NLTK Corpus on Movie Reviews

Q3a Use the following reference analyze sentiment analysis on Movie Review “Q3 Movie Reviews.py”

https://www.nltk.org/book/ch06.html

Q3b – Explain how the Bag of Words model help in sentiment analysis

http://blog.chapagain.com.np/python-nltk-sentiment…

Summarize the entire code in NLTKMovieReview.py file as a part of the solution

Q4 Twitter Analysis sentiment140

Perform a Twitter sentiment analysis –

  • Users on twitter create short messages called tweets to be shared with other twitter users

– who interact by retweeting and responding?

– Twitter employs a message size restriction of 280 characters or less

– forces the users to stay focused on the message they wish to disseminate.

– Twitter data is great for Machine Learning (ML) task of sentiment analysis.

– Sentiment Analysis falls under Natural Language Processing (NLP)

  • The training data is obtained from Sentiment140

– made up of about 1.6 million random tweets

– with corresponding binary labels. 0 for Negative sentiment and 1 for Positive sentiment.

  • Use Naive Bayes Classifier to learn the correct labels from this training set.

https://towardsdatascience.com/the-real-world-as-s…

Q5 Analyze Clothing Reviews

https://www.kaggle.com/nicapotato/womens-ecommerce…

A women’s Clothing E-Commerce site revolving around the reviews written by customers. This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables:

  • Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.
  • Age: Positive Integer variable of the reviewers age.
  • Title: String variable for the title of the review.
  • Review Text: String variable for the review body.
  • Rating: Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
  • Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
  • Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive.
  • Division Name: Categorical name of the product high level division.
  • Department Name: Categorical name of the product department name.

Class Name: Categorical name of the product class name

Perform

a. Text extraction & creating a corpus

b. Text Pre-processing

c. Create the DTM & TDM from the corpus

d. Exploratory text analysis

e. Feature extraction by removing sparsity

f. Build the Classification Models and compare Logistic Regression to Random Forest regression

https://medium.com/analytics-vidhya/customer-revie…

Q1 Review the python script in Q1 Folder – NLTK_Text_Analysis.py

Use text below to apply the same process

Text= “””Backgammon is one of the oldest known board games. Its history can be traced back nearly 5,000 years to archeological discoveries in the Middle East. It is a two player game where each player has fifteen checkers which move between twenty-four points according to the roll of two dice.”””

a. Text Analysis Operations using NLTK

b. Tokenization

c. Stopwords removal

d. Lexicon Normalization such as Stemming and Lemmatization

e. POS Tagging

Q2 Analyze the customer reviews in the file Restaurant_Reviews.tsv

Explain each step for the following text clean-up commands

a. Explain each step for the following text clean-up commands

review = dataset[‘Review’][0]

review = re.sub(‘[^a-zA-Z]’, ‘ ‘, dataset[‘Review’][0])

review = review.lower()

review = review.split()

ps = PorterStemmer()

review = [ps.stem(word) for word in review if not word in set(stopwords.words(‘english’))]

review = ‘ ‘.join(review)

b. What is the classification question?

c. The example uses the Naïve Bayes classifier to classify the sentiments. Calculate the confusion matrix:

TP = # True Positives,

TN = # True Negatives,

FP = # False Positives,

FN = # False Negatives):

Accuracy = (TP + TN) / (TP + TN + FP + FN)

d. Apply the logistic regression classifier to the problem – recalculate “c” i.e. TP, TN, FP, FN, Accuracy

Q3 NLTK Corpus on Movie Reviews

Q3a Use the following reference analyze sentiment analysis on Movie Review “Q3 Movie Reviews.py”

https://www.nltk.org/book/ch06.html

Q3b – Explain how the Bag of Words model help in sentiment analysis

http://blog.chapagain.com.np/python-nltk-sentiment…

Summarize the entire code in NLTKMovieReview.py file as a part of the solution

Q4 Twitter Analysis sentiment140

Perform a Twitter sentiment analysis –

  • Users on twitter create short messages called tweets to be shared with other twitter users

– who interact by retweeting and responding?

– Twitter employs a message size restriction of 280 characters or less

– forces the users to stay focused on the message they wish to disseminate.

– Twitter data is great for Machine Learning (ML) task of sentiment analysis.

– Sentiment Analysis falls under Natural Language Processing (NLP)

  • The training data is obtained from Sentiment140

– made up of about 1.6 million random tweets

– with corresponding binary labels. 0 for Negative sentiment and 1 for Positive sentiment.

  • Use Naive Bayes Classifier to learn the correct labels from this training set.

https://towardsdatascience.com/the-real-world-as-s…

Q5 Analyze Clothing Reviews

https://www.kaggle.com/nicapotato/womens-ecommerce…

A women’s Clothing E-Commerce site revolving around the reviews written by customers. This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables:

  • Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.
  • Age: Positive Integer variable of the reviewers age.
  • Title: String variable for the title of the review.
  • Review Text: String variable for the review body.
  • Rating: Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
  • Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
  • Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive.
  • Division Name: Categorical name of the product high level division.
  • Department Name: Categorical name of the product department name.

Class Name: Categorical name of the product class name

Perform

a. Text extraction & creating a corpus

b. Text Pre-processing

c. Create the DTM & TDM from the corpus

d. Exploratory text analysis

e. Feature extraction by removing sparsity

f. Build the Classification Models and compare Logistic Regression to Random Forest regression

https://medium.com/analytics-vidhya/customer-revie…

HW11.docx

Q2 Restaurant Reviews.zip

Q1 NLP Basics.zip

Essay Assign
Calculate your paper price
Pages (550 words)
Approximate price: -

Our Advantages

Plagiarism Free Papers

All our papers are original and written from scratch. We will email you a plagiarism report alongside your completed paper once done.

Free Revisions

All papers are submitted ahead of time. We do this to allow you time to point out any area you would need revision on, and help you for free.

Title-page

A title page preceeds all your paper content. Here, you put all your personal information and this we give out for free.

Bibliography

Without a reference/bibliography page, any academic paper is incomplete and doesnt qualify for grading. We also offer this for free.

Originality & Security

At Essay Assign, we take confidentiality seriously and all your personal information is stored safely and do not share it with third parties for any reasons whatsoever. Our work is original and we send plagiarism reports alongside every paper.

24/7 Customer Support

Our agents are online 24/7. Feel free to contact us through email or talk to our live agents.

Try it now!

Calculate the price of your order

We'll send you the first draft for approval by at
Total price:
$0.00

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

We work around the clock to see best customer experience.

Pricing

Flexible Pricing

Our prices are pocket friendly and you can do partial payments. When that is not enough, we have a free enquiry service.

Communication

Admission help & Client-Writer Contact

When you need to elaborate something further to your writer, we provide that button.

Deadlines

Paper Submission

We take deadlines seriously and our papers are submitted ahead of time. We are happy to assist you in case of any adjustments needed.

Reviews

Customer Feedback

Your feedback, good or bad is of great concern to us and we take it very seriously. We are, therefore, constantly adjusting our policies to ensure best customer/writer experience.