Sentiment Analysis with an Recurrent Neural Networks (RNN)

Recurrent Neural Networks (RNNs) are widely used for sentiment analysis because they can capture contextual and sequential information from text data.

Designed for sequence-based tasks
Learns patterns from sequential text data
Captures contextual information using hidden states
Commonly used in NLP and sentiment analysis tasks

Implementation

1. Importing Libraries and Dataset

Here we will be importing numpy, pandas, Regular Expression (RegEx), scikit-learn and tensorflow.

Python

import pandas as pd
import numpy as np
import re  
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Embedding

2. Loading Dataset

We will be using swiggy dataset of customer reviews.

You can download dataset from here.

pd.read_csv() : Reads the CSV file into a Pandas DataFrame
data.columns : Accesses the column names of the DataFrame
tolist() : Converts the column names from an Index object to a regular Python list

Python

data = pd.read_csv('swiggy.csv')
print("Columns in the dataset:")
print(data.columns.tolist())

Output:

Columns in the dataset:
['ID', 'Area', 'City', 'Restaurant Price', 'Avg Rating', 'Total Rating', 'Food Item', 'Food Type', 'Delivery Time', 'Review']

3. Text Cleaning and Sentiment Labeling

The review text is cleaned and sentiment labels are generated from ratings before training the model.

Converts review text to lowercase
Removes special characters and punctuation
Creates sentiment labels from ratings
Removes rows with missing values

Python

data["Review"] = data["Review"].str.lower()
data["Review"] = data["Review"].replace(r'[^a-z0-9\s]', '', regex=True)

data['sentiment'] = data['Avg Rating'].apply(lambda x: 1 if x > 3.5 else 0)
data = data.dropna()

4. Tokenization and Padding

The text data is converted into numerical sequences and padded to ensure all inputs have the same length for model training.

max_features = 5000 sets vocabulary size
max_length = 200 defines sequence length
Tokenizer() converts words into integer sequences
fit_on_texts() creates the word index
texts_to_sequences() converts reviews into sequences
pad_sequences() pads or truncates sequences to equal length
y = data['sentiment'].values extracts sentiment labels

Python

max_features = 5000
max_length = 200

tokenizer = Tokenizer(num_words=max_features)
tokenizer.fit_on_texts(data["Review"])
X = pad_sequences(tokenizer.texts_to_sequences(
    data["Review"]), maxlen=max_length)
y = data['sentiment'].values

5. Splitting the Data

The dataset is divided into training, validation and test sets while preserving the sentiment class distribution.

train_test_split(..., test_size=0.2, stratify=y) splits data into 80% training and 20% testing
train_test_split(..., test_size=0.1, stratify=y_train) creates a validation set from training data
stratify maintains balanced class distribution across all sets

Python

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
X_train, X_val, y_train, y_val = train_test_split(
    X_train, y_train, test_size=0.1, random_state=42, stratify=y_train
)

6. Building RNN Model

A simple RNN model is built and compiled for binary sentiment classification.

Sequential([...]) creates the neural network model
Embedding(...) converts words into dense vector representations
SimpleRNN(64, activation='tanh') adds an RNN layer with 64 units
Dense(1, activation='sigmoid') creates the binary output layer
model.compile(...) configures the model with loss function, optimizer, and accuracy metric

Python

model = Sequential([
    Embedding(input_dim=max_features, output_dim=16, input_length=max_length),
    SimpleRNN(64, activation='tanh', return_sequences=False),
    Dense(1, activation='sigmoid')
])

model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

7. Training and Evaluating Model

The model is trained on the training data, validated during training and finally evaluated on the test dataset.

model.fit(...) trains the model for 5 epochs with batch size 32
Uses validation data to monitor performance during training
model.evaluate(...) tests the model on unseen data
print(...) displays the final test accuracy

Python

history = model.fit(
    X_train, y_train,
    epochs=5,
    batch_size=32,
    validation_data=(X_val, y_val),
    verbose=1
)

score = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {score[1]:.2f}")

Output:

training — Training and Evaluating Model

8. Predicting Sentiment

A function is created to preprocess a review, predict its sentiment, and display the prediction result.

review_text.lower() converts text to lowercase
re.sub(...) removes special characters and punctuation
tokenizer.texts_to_sequences() converts text into word sequences
pad_sequences() pads the sequence to fixed length
model.predict() predicts sentiment probability
Returns Positive if probability ≥ 0.5, otherwise Negative

Python

def predict_sentiment(review_text):
    text = review_text.lower()
    text = re.sub(r'[^a-z0-9\s]', '', text)

    seq = tokenizer.texts_to_sequences([text])
    padded = pad_sequences(seq, maxlen=max_length)

    prediction = model.predict(padded)[0][0]
    return f"{'Positive' if prediction >= 0.5 else 'Negative'} (Probability: {prediction:.2f})"


sample_review = "The food was great."
print(f"Review: {sample_review}")
print(f"Sentiment: {predict_sentiment(sample_review)}")

Output:

You can download the source code from here.

Sentiment Analysis with an Recurrent Neural Networks (RNN)

Implementation

1. Importing Libraries and Dataset

2. Loading Dataset

3. Text Cleaning and Sentiment Labeling

4. Tokenization and Padding

5. Splitting the Data

6. Building RNN Model

7. Training and Evaluating Model

8. Predicting Sentiment

Explore