Probabilistic Matrix Factorization

Probabilistic Matrix Factorization (PMF) is a collaborative filtering technique used in recommendation systems. Unlike traditional matrix factorization, PMF incorporates probability theory to model uncertainty in user-item interactions. This makes it highly effective for sparse datasets, where only a fraction of the ratings are observed. PMF extends MF by introducing a probabilistic model:

Latent factors are assumed to follow Gaussian distributions.
Observed ratings are modeled as Gaussian distributions around the predicted dot product.
Noise & uncertainty in data are naturally captured.

Matrix Factorization

Matrix Factorization (MF) is a collaborative filtering approach that predicts missing entries in a user-item interaction matrix (R) by decomposing it into two smaller matrices.

R\approx U. V^T

U: user latent factor matrix.
V: item latent factor matrix.
Each user and item is represented as a vector in a latent space.
The dot product U{_i ^T}V_j predicts the preference of user i for item j.

Key Concepts

1. Latent Factors

Latent factors represent hidden features of users and items.

Each user is represented by a vector U_i .
Each item is represented by a vector V_j .
Their dot product U_i^T V_j gives the predicted rating.

U_i \in \mathbb{R}^k, \quad V_j \in \mathbb{R}^k, \quad \hat{R}_{ij} = U_i^T V_j

Where,

k: number of latent dimensions (e.g., genre preferences for movies).
U_i : hidden representation of user i.
V_j : hidden representation of item j.
\hat R_{ij}: predicted rating.

2. Gaussian Priors

PMF assumes that latent factors are drawn from Gaussian (normal) distributions.

U \sim \mathcal{N}(0, \sigma_U^2 I), \quad V \sim \mathcal{N}(0, \sigma_V^2 I)

Where,

U: user latent factor matrix.
V: item latent factor matrix.
\sigma _U^2 , \sigma _V^2 : variance terms controlling spread.
This prior ensures regularization, keeping factor values from becoming too large.

3. Observed Ratings

Each rating is modeled as a Gaussian centered on the dot product of user and item vectors.

R_{ij} \sim \mathcal{N}(U_i^T V_j, \sigma_R^2)

Where,

R_{ij }: observed rating given by user i to item j.
U_i^T V_j : expected rating (mean of distribution).
\sigma _R^2 : variance (captures noise in user preferences).
This means ratings are not exact but have uncertainty.

4. Objective Function

The goal is to find latent factors that maximize likelihood (fit the observed ratings) while avoiding overfitting.

L = \sum_{(i,j) \in R} (R_{ij} - U_i^T V_j)^2 + \lambda_U \sum_i \|U_i\|^2 + \lambda_V \sum_j \|V_j\|^2

Where,

First term: squared error between actual and predicted ratings.
Second term: \lambda_U \Sigma_i ||U_i||^2 : regularization for users.
Third term: \lambda_V \Sigma_j||V_j||^2 : regularization for items.
\lambda_U,\lambda_V : control the strength of regularization.
Minimizing LLL balances accuracy (fit to data) and simplicity (avoid large values).

Implementation

We will implement PMF using Stochastic Gradient Descent (SGD) in NumPy.

Step 1: Import Library

We will import the necessary library such as NumPy.

Python

import numpy as np

Step 2: Define PMF Class

We will define the PMF model with parameters for ratings, latent dimensions, learning rate, regularization and training epochs.

Python

class PMF:
    def __init__(self, R, num_factors, learning_rate=0.01, reg_param=0.01, num_epochs=100):
        self.R = R
        self.num_users, self.num_items = R.shape
        self.num_factors = num_factors
        self.learning_rate = learning_rate
        self.reg_param = reg_param
        self.num_epochs = num_epochs

Initializes latent factors for users and items.
Uses Stochastic Gradient Descent to update latent factors based on observed ratings.
Computes Mean Squared Error (MSE) at the end of each epoch to track training progress.

Python

def train(self):
    self.U = np.random.normal(
        scale=1. / self.num_factors, size=(self.num_users, self.num_factors))
    self.V = np.random.normal(
        scale=1. / self.num_factors, size=(self.num_items, self.num_factors))

    for epoch in range(self.num_epochs):
        for i in range(self.num_users):
            for j in range(self.num_items):
                if self.R[i, j] > 0:
                    prediction = self.predict(i, j)
                    error = self.R[i, j] - prediction

                    self.U[i, :] += self.learning_rate * \
                        (error * self.V[j, :] - self.reg_param * self.U[i, :])
                    self.V[j, :] += self.learning_rate * \
                        (error * self.U[i, :] - self.reg_param * self.V[j, :])

        mse = self.compute_mse()
        print(f'Epoch: {epoch+1}, MSE: {mse}')

predict: estimates a single rating using the dot product of user and item vectors.
compute_mse: calculates error only on observed entries.
full_matrix: reconstructs the entire rating matrix with predicted values.

Python

def predict(self, i, j):
        return np.dot(self.U[i, :], self.V[j, :])

    def compute_mse(self):
        xs, ys = self.R.nonzero()
        predicted = self.full_matrix()
        error = 0
        for x, y in zip(xs, ys):
            error += (self.R[x, y] - predicted[x, y]) ** 2
        return np.sqrt(error)

    def full_matrix(self):
        return np.dot(self.U, self.V.T)

Step 3: Generate Synthetic Data and Train Model

We will

Creates synthetic latent factors for users and items to simulate a rating dataset.
Adds Gaussian noise to make the data realistic.
Masks some entries to simulate missing values.
Trains the PMF model with Stochastic Gradient Descent.
Reconstructs the full rating matrix, filling in missing values with predictions.

Python

np.random.seed(0)
num_users, num_items, num_factors = 100, 50, 10

U_true = np.random.normal(0, 1, (num_users, num_factors))
V_true = np.random.normal(0, 1, (num_items, num_factors))
R = np.dot(U_true, V_true.T) + np.random.normal(0, 0.1, (num_users, num_items))
mask = np.random.rand(*R.shape) < 0.8
R[~mask] = 0

pmf = PMF(R, num_factors=num_factors, learning_rate=0.01,
          reg_param=0.01, num_epochs=50)
pmf.train()

R_reconstructed = pmf.full_matrix()

Output:

Applications

Movie Recommendations: Predict user ratings for unseen movies (e.g., Netflix, Movielens).
E-commerce: Suggest products based on previous purchase history.
Music Streaming: Recommend songs/playlists (Spotify, Last.fm).
Social Media: Suggest friends, groups or content.
Online Learning: Recommend courses/resources tailored to learner profiles.

Limitations

Assumes linear interactions between latent factors.
Computationally expensive for large datasets.
Struggles with cold-start problem (new users/items with no history).
Hyperparameter tuning (learning rate, regularization, factors) can be tricky.
May converge to local minima if poorly initialized.

Probabilistic Matrix Factorization

Matrix Factorization

Key Concepts

1. Latent Factors

2. Gaussian Priors

3. Observed Ratings

4. Objective Function

Implementation

Step 1: Import Library

Step 2: Define PMF Class

Applications

Limitations

Explore