Sentiment Analysis for Move Reviews

This program is aimed at performing sentiment analysis on the dataset of movie reviews provided by IMDB. Aspects such as Data Analytics, Feature Selection and Machine Learning Classifiers are being implemented. This project is created using java, OpenNLP and the weka machine learning package.

Feature Selection

Feature Selection is the process of selecting a subset of relevant terms that aid in the building of ML models. The feature selection methods implenmeted in this project are:

Part of speech tagging
Part of speech tagging with stop word removal
Document Frequency Cutoff

The OpenNLP library was used in implementing PoS tagging (Tagging adverbs abd adjectives)
Document Frequency Cutoff is implemented by me using a HashMap take a look at FeatureSelection.java.

Creating ML Models

Once the features have been selected two text based machine learning modes are to be built, mainly Naive Bayes Multinomial and Stochastic Gradient Decent
For each feature selection method a ML model will be created and a F-Measure will be outputted which is basically the accuracy of the model.

Results

The most accurate combination of feature selection and ML model is Naive Bayes Multinomial and PoS tagging feature selection. With and average F-Measure of .80
The worst performing combination of feature selection and ML model is Simple Logistic Classifer with Document Frequency Cutoff with an average F-Measure of .75

NBM = Naive Bayes Multinomial
SGD = Sthocastic Gradient Decent
Simple Log = Simple Logistic Classifer

For a more detailed report have a look at `Report.pdf`

To build the whole project run:

make

Reuqired files, please make sure these files are in the root directory: weka.jar opennlp-tools-1.9.1.jar filtered_sentoken/ OpenNLP_models txt_sentoken/

To run the Bayes classifer on a feature size of 500 with the first feature selection method: make run-all-bayes-500
To run the Bayes classifier on a feature size of 500 with the second feature selection method: make run-all-bayes-stopword-500
To run the stocastic gradient decent with a feature size of 500 and the frist feature selection method run: make run-all-sgd-500:
To run the stocastic gradient decent classifier on a feature size of 500 with the second feature selection method: make run-all-sgd-stopword-500
To run the simple logistic with a feature size of 500 and the frist feature selection method run: make run-all-simplelog-500:
To run the simple logistic classifier on a feature size of 500 with the second feature selection method: make run-all-simplelog-stopword-500

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
OpenNLP_models		OpenNLP_models
filtered_sentoken		filtered_sentoken
txt_sentoken		txt_sentoken
.DS_Store		.DS_Store
BayesClassifier.java		BayesClassifier.java
CorpusFilter.java		CorpusFilter.java
Corpus_anlysis.txt		Corpus_anlysis.txt
DataAnalytics.java		DataAnalytics.java
FeatureSelection.java		FeatureSelection.java
FilterSplitCorpus.java		FilterSplitCorpus.java
GradientDecent.java		GradientDecent.java
ML_Results.png		ML_Results.png
Report.pdf		Report.pdf
Report_4500.docx		Report_4500.docx
Results_Sentiment_analysis.xlsx		Results_Sentiment_analysis.xlsx
SimpleLog.java		SimpleLog.java
SimpleLogistic.java		SimpleLogistic.java
SplitDataset.java		SplitDataset.java
StopWordRemoval.java		StopWordRemoval.java
corpus.arff		corpus.arff
features.txt		features.txt
intermediate_training.arff		intermediate_training.arff
makefile		makefile
opennlp-tools-1.9.1.jar		opennlp-tools-1.9.1.jar
poldata.README.2.0		poldata.README.2.0
readme.md		readme.md
stop_word_feat.txt		stop_word_feat.txt
stopwords.txt		stopwords.txt
test.arff		test.arff
training.arff		training.arff
validation.arff		validation.arff
weka.jar		weka.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sentiment Analysis for Move Reviews

Feature Selection

Creating ML Models

Results

For a more detailed report have a look at `Report.pdf`

To build the whole project run:

About

Uh oh!

Releases

Packages

Languages

kimurav/Sentiment-Analyzer

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis for Move Reviews

Feature Selection

Creating ML Models

Results

For a more detailed report have a look at Report.pdf

To build the whole project run:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

For a more detailed report have a look at `Report.pdf`

Packages