Sentiment Analysis with an Recurrent Neural Networks (RNN)

Last Updated : 15 May, 2026

Recurrent Neural Networks (RNNs) are widely used for sentiment analysis because they can capture contextual and sequential information from text data.

  • Designed for sequence-based tasks
  • Learns patterns from sequential text data
  • Captures contextual information using hidden states
  • Commonly used in NLP and sentiment analysis tasks

Implementation

1. Importing Libraries and Dataset

Here we will be importing numpy, pandas, Regular Expression (RegEx), scikit-learn and tensorflow.

Python
import pandas as pd
import numpy as np
import re  
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Embedding

2. Loading Dataset

We will be using swiggy dataset of customer reviews.

You can download dataset from here

  • pd.read_csv() : Reads the CSV file into a Pandas DataFrame
  • data.columns : Accesses the column names of the DataFrame
  • tolist() : Converts the column names from an Index object to a regular Python list
Python
data = pd.read_csv('swiggy.csv')
print("Columns in the dataset:")
print(data.columns.tolist())

Output:

Columns in the dataset:
['ID', 'Area', 'City', 'Restaurant Price', 'Avg Rating', 'Total Rating', 'Food Item', 'Food Type', 'Delivery Time', 'Review']

3. Text Cleaning and Sentiment Labeling

The review text is cleaned and sentiment labels are generated from ratings before training the model.

  • Converts review text to lowercase
  • Removes special characters and punctuation
  • Creates sentiment labels from ratings
  • Removes rows with missing values
Python
data["Review"] = data["Review"].str.lower()
data["Review"] = data["Review"].replace(r'[^a-z0-9\s]', '', regex=True)

data['sentiment'] = data['Avg Rating'].apply(lambda x: 1 if x > 3.5 else 0)
data = data.dropna()

4. Tokenization and Padding

The text data is converted into numerical sequences and padded to ensure all inputs have the same length for model training.

  • max_features = 5000 sets vocabulary size
  • max_length = 200 defines sequence length
  • Tokenizer() converts words into integer sequences
  • fit_on_texts() creates the word index
  • texts_to_sequences() converts reviews into sequences
  • pad_sequences() pads or truncates sequences to equal length
  • y = data['sentiment'].values extracts sentiment labels
Python
max_features = 5000
max_length = 200

tokenizer = Tokenizer(num_words=max_features)
tokenizer.fit_on_texts(data["Review"])
X = pad_sequences(tokenizer.texts_to_sequences(
    data["Review"]), maxlen=max_length)
y = data['sentiment'].values

5. Splitting the Data

The dataset is divided into training, validation and test sets while preserving the sentiment class distribution.

  • train_test_split(..., test_size=0.2, stratify=y) splits data into 80% training and 20% testing
  • train_test_split(..., test_size=0.1, stratify=y_train) creates a validation set from training data
  • stratify maintains balanced class distribution across all sets
Python
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
X_train, X_val, y_train, y_val = train_test_split(
    X_train, y_train, test_size=0.1, random_state=42, stratify=y_train
)

6. Building RNN Model

A simple RNN model is built and compiled for binary sentiment classification.

  • Sequential([...]) creates the neural network model
  • Embedding(...) converts words into dense vector representations
  • SimpleRNN(64, activation='tanh') adds an RNN layer with 64 units
  • Dense(1, activation='sigmoid') creates the binary output layer
  • model.compile(...) configures the model with loss function, optimizer, and accuracy metric
Python
model = Sequential([
    Embedding(input_dim=max_features, output_dim=16, input_length=max_length),
    SimpleRNN(64, activation='tanh', return_sequences=False),
    Dense(1, activation='sigmoid')
])

model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

7. Training and Evaluating Model

The model is trained on the training data, validated during training and finally evaluated on the test dataset.

  • model.fit(...) trains the model for 5 epochs with batch size 32
  • Uses validation data to monitor performance during training
  • model.evaluate(...) tests the model on unseen data
  • print(...) displays the final test accuracy
Python
history = model.fit(
    X_train, y_train,
    epochs=5,
    batch_size=32,
    validation_data=(X_val, y_val),
    verbose=1
)

score = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {score[1]:.2f}")

Output:

training
Training and Evaluating Model

8. Predicting Sentiment

A function is created to preprocess a review, predict its sentiment, and display the prediction result.

  • review_text.lower() converts text to lowercase
  • re.sub(...) removes special characters and punctuation
  • tokenizer.texts_to_sequences() converts text into word sequences
  • pad_sequences() pads the sequence to fixed length
  • model.predict() predicts sentiment probability
  • Returns Positive if probability ≥ 0.5, otherwise Negative
Python
def predict_sentiment(review_text):
    text = review_text.lower()
    text = re.sub(r'[^a-z0-9\s]', '', text)

    seq = tokenizer.texts_to_sequences([text])
    padded = pad_sequences(seq, maxlen=max_length)

    prediction = model.predict(padded)[0][0]
    return f"{'Positive' if prediction >= 0.5 else 'Negative'} (Probability: {prediction:.2f})"


sample_review = "The food was great."
print(f"Review: {sample_review}")
print(f"Sentiment: {predict_sentiment(sample_review)}")

Output:

output
Predicting Sentiment

You can download the source code from here.

Comment