Probabilistic Matrix Factorization (PMF) is a collaborative filtering technique used in recommendation systems. Unlike traditional matrix factorization, PMF incorporates probability theory to model uncertainty in user-item interactions. This makes it highly effective for sparse datasets, where only a fraction of the ratings are observed. PMF extends MF by introducing a probabilistic model:
- Latent factors are assumed to follow Gaussian distributions.
- Observed ratings are modeled as Gaussian distributions around the predicted dot product.
- Noise & uncertainty in data are naturally captured.
Matrix Factorization
Matrix Factorization (MF) is a collaborative filtering approach that predicts missing entries in a user-item interaction matrix (R) by decomposing it into two smaller matrices.
R\approx U. V^T
U : user latent factor matrix.V : item latent factor matrix.- Each user and item is represented as a vector in a latent space.
- The dot product
U{_i ^T}V_j predicts the preference of useri for itemj .
Key Concepts
1. Latent Factors
Latent factors represent hidden features of users and items.
- Each user is represented by a vector
U_i . - Each item is represented by a vector
V_j . - Their dot product
U_i^T V_j gives the predicted rating.
U_i \in \mathbb{R}^k, \quad V_j \in \mathbb{R}^k, \quad \hat{R}_{ij} = U_i^T V_j
Where,
k : number of latent dimensions (e.g., genre preferences for movies).U_i : hidden representation of user i.V_j : hidden representation of item j.\hat R_{ij} : predicted rating.
2. Gaussian Priors
PMF assumes that latent factors are drawn from Gaussian (normal) distributions.
U \sim \mathcal{N}(0, \sigma_U^2 I), \quad V \sim \mathcal{N}(0, \sigma_V^2 I)
Where,
U : user latent factor matrix.V : item latent factor matrix.\sigma _U^2 , \sigma _V^2 : variance terms controlling spread.- This prior ensures regularization, keeping factor values from becoming too large.
3. Observed Ratings
Each rating is modeled as a Gaussian centered on the dot product of user and item vectors.
R_{ij} \sim \mathcal{N}(U_i^T V_j, \sigma_R^2)
Where,
R_{ij } : observed rating given by user i to item j.U_i^T V_j : expected rating (mean of distribution).\sigma _R^2 : variance (captures noise in user preferences).- This means ratings are not exact but have uncertainty.
4. Objective Function
The goal is to find latent factors that maximize likelihood (fit the observed ratings) while avoiding overfitting.
L = \sum_{(i,j) \in R} (R_{ij} - U_i^T V_j)^2 + \lambda_U \sum_i \|U_i\|^2 + \lambda_V \sum_j \|V_j\|^2
Where,
- First term: squared error between actual and predicted ratings.
- Second term:
\lambda_U \Sigma_i ||U_i||^2 : regularization for users. - Third term:
\lambda_V \Sigma_j||V_j||^2 : regularization for items. \lambda_U,\lambda_V : control the strength of regularization.- Minimizing LLL balances accuracy (fit to data) and simplicity (avoid large values).
Implementation
We will implement PMF using Stochastic Gradient Descent (SGD) in NumPy.
Step 1: Import Library
We will import the necessary library such as NumPy.
import numpy as np
Step 2: Define PMF Class
We will define the PMF model with parameters for ratings, latent dimensions, learning rate, regularization and training epochs.
class PMF:
def __init__(self, R, num_factors, learning_rate=0.01, reg_param=0.01, num_epochs=100):
self.R = R
self.num_users, self.num_items = R.shape
self.num_factors = num_factors
self.learning_rate = learning_rate
self.reg_param = reg_param
self.num_epochs = num_epochs
- Initializes latent factors for users and items.
- Uses Stochastic Gradient Descent to update latent factors based on observed ratings.
- Computes Mean Squared Error (MSE) at the end of each epoch to track training progress.
def train(self):
self.U = np.random.normal(
scale=1. / self.num_factors, size=(self.num_users, self.num_factors))
self.V = np.random.normal(
scale=1. / self.num_factors, size=(self.num_items, self.num_factors))
for epoch in range(self.num_epochs):
for i in range(self.num_users):
for j in range(self.num_items):
if self.R[i, j] > 0:
prediction = self.predict(i, j)
error = self.R[i, j] - prediction
self.U[i, :] += self.learning_rate * \
(error * self.V[j, :] - self.reg_param * self.U[i, :])
self.V[j, :] += self.learning_rate * \
(error * self.U[i, :] - self.reg_param * self.V[j, :])
mse = self.compute_mse()
print(f'Epoch: {epoch+1}, MSE: {mse}')
- predict: estimates a single rating using the dot product of user and item vectors.
- compute_mse: calculates error only on observed entries.
- full_matrix: reconstructs the entire rating matrix with predicted values.
def predict(self, i, j):
return np.dot(self.U[i, :], self.V[j, :])
def compute_mse(self):
xs, ys = self.R.nonzero()
predicted = self.full_matrix()
error = 0
for x, y in zip(xs, ys):
error += (self.R[x, y] - predicted[x, y]) ** 2
return np.sqrt(error)
def full_matrix(self):
return np.dot(self.U, self.V.T)
Step 3: Generate Synthetic Data and Train Model
We will
- Creates synthetic latent factors for users and items to simulate a rating dataset.
- Adds Gaussian noise to make the data realistic.
- Masks some entries to simulate missing values.
- Trains the PMF model with Stochastic Gradient Descent.
- Reconstructs the full rating matrix, filling in missing values with predictions.
np.random.seed(0)
num_users, num_items, num_factors = 100, 50, 10
U_true = np.random.normal(0, 1, (num_users, num_factors))
V_true = np.random.normal(0, 1, (num_items, num_factors))
R = np.dot(U_true, V_true.T) + np.random.normal(0, 0.1, (num_users, num_items))
mask = np.random.rand(*R.shape) < 0.8
R[~mask] = 0
pmf = PMF(R, num_factors=num_factors, learning_rate=0.01,
reg_param=0.01, num_epochs=50)
pmf.train()
R_reconstructed = pmf.full_matrix()
Output:
Applications
- Movie Recommendations: Predict user ratings for unseen movies (e.g., Netflix, Movielens).
- E-commerce: Suggest products based on previous purchase history.
- Music Streaming: Recommend songs/playlists (Spotify, Last.fm).
- Social Media: Suggest friends, groups or content.
- Online Learning: Recommend courses/resources tailored to learner profiles.
Limitations
- Assumes linear interactions between latent factors.
- Computationally expensive for large datasets.
- Struggles with cold-start problem (new users/items with no history).
- Hyperparameter tuning (learning rate, regularization, factors) can be tricky.
- May converge to local minima if poorly initialized.