Recurrent Neural Networks (RNNs) are widely used for sentiment analysis because they can capture contextual and sequential information from text data.
- Designed for sequence-based tasks
- Learns patterns from sequential text data
- Captures contextual information using hidden states
- Commonly used in NLP and sentiment analysis tasks
Implementation
1. Importing Libraries and Dataset
Here we will be importing numpy, pandas, Regular Expression (RegEx), scikit-learn and tensorflow.
import pandas as pd
import numpy as np
import re
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Embedding
2. Loading Dataset
We will be using swiggy dataset of customer reviews.
You can download dataset from here.
- pd.read_csv() : Reads the CSV file into a Pandas DataFrame
- data.columns : Accesses the column names of the DataFrame
- tolist() : Converts the column names from an Index object to a regular Python list
data = pd.read_csv('swiggy.csv')
print("Columns in the dataset:")
print(data.columns.tolist())
Output:
Columns in the dataset:
['ID', 'Area', 'City', 'Restaurant Price', 'Avg Rating', 'Total Rating', 'Food Item', 'Food Type', 'Delivery Time', 'Review']
3. Text Cleaning and Sentiment Labeling
The review text is cleaned and sentiment labels are generated from ratings before training the model.
- Converts review text to lowercase
- Removes special characters and punctuation
- Creates sentiment labels from ratings
- Removes rows with missing values
data["Review"] = data["Review"].str.lower()
data["Review"] = data["Review"].replace(r'[^a-z0-9\s]', '', regex=True)
data['sentiment'] = data['Avg Rating'].apply(lambda x: 1 if x > 3.5 else 0)
data = data.dropna()
4. Tokenization and Padding
The text data is converted into numerical sequences and padded to ensure all inputs have the same length for model training.
- max_features = 5000 sets vocabulary size
- max_length = 200 defines sequence length
- Tokenizer() converts words into integer sequences
- fit_on_texts() creates the word index
- texts_to_sequences() converts reviews into sequences
- pad_sequences() pads or truncates sequences to equal length
- y = data['sentiment'].values extracts sentiment labels
max_features = 5000
max_length = 200
tokenizer = Tokenizer(num_words=max_features)
tokenizer.fit_on_texts(data["Review"])
X = pad_sequences(tokenizer.texts_to_sequences(
data["Review"]), maxlen=max_length)
y = data['sentiment'].values
5. Splitting the Data
The dataset is divided into training, validation and test sets while preserving the sentiment class distribution.
- train_test_split(..., test_size=0.2, stratify=y) splits data into 80% training and 20% testing
- train_test_split(..., test_size=0.1, stratify=y_train) creates a validation set from training data
- stratify maintains balanced class distribution across all sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
X_train, X_val, y_train, y_val = train_test_split(
X_train, y_train, test_size=0.1, random_state=42, stratify=y_train
)
6. Building RNN Model
A simple RNN model is built and compiled for binary sentiment classification.
- Sequential([...]) creates the neural network model
- Embedding(...) converts words into dense vector representations
- SimpleRNN(64, activation='tanh') adds an RNN layer with 64 units
- Dense(1, activation='sigmoid') creates the binary output layer
- model.compile(...) configures the model with loss function, optimizer, and accuracy metric
model = Sequential([
Embedding(input_dim=max_features, output_dim=16, input_length=max_length),
SimpleRNN(64, activation='tanh', return_sequences=False),
Dense(1, activation='sigmoid')
])
model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
7. Training and Evaluating Model
The model is trained on the training data, validated during training and finally evaluated on the test dataset.
- model.fit(...) trains the model for 5 epochs with batch size 32
- Uses validation data to monitor performance during training
- model.evaluate(...) tests the model on unseen data
- print(...) displays the final test accuracy
history = model.fit(
X_train, y_train,
epochs=5,
batch_size=32,
validation_data=(X_val, y_val),
verbose=1
)
score = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {score[1]:.2f}")
Output:

8. Predicting Sentiment
A function is created to preprocess a review, predict its sentiment, and display the prediction result.
- review_text.lower() converts text to lowercase
- re.sub(...) removes special characters and punctuation
- tokenizer.texts_to_sequences() converts text into word sequences
- pad_sequences() pads the sequence to fixed length
- model.predict() predicts sentiment probability
- Returns Positive if probability ≥ 0.5, otherwise Negative
def predict_sentiment(review_text):
text = review_text.lower()
text = re.sub(r'[^a-z0-9\s]', '', text)
seq = tokenizer.texts_to_sequences([text])
padded = pad_sequences(seq, maxlen=max_length)
prediction = model.predict(padded)[0][0]
return f"{'Positive' if prediction >= 0.5 else 'Negative'} (Probability: {prediction:.2f})"
sample_review = "The food was great."
print(f"Review: {sample_review}")
print(f"Sentiment: {predict_sentiment(sample_review)}")
