Probabilistic Matrix Factorization

Last Updated : 30 Sep, 2025

Probabilistic Matrix Factorization (PMF) is a collaborative filtering technique used in recommendation systems. Unlike traditional matrix factorization, PMF incorporates probability theory to model uncertainty in user-item interactions. This makes it highly effective for sparse datasets, where only a fraction of the ratings are observed. PMF extends MF by introducing a probabilistic model:

  • Latent factors are assumed to follow Gaussian distributions.
  • Observed ratings are modeled as Gaussian distributions around the predicted dot product.
  • Noise & uncertainty in data are naturally captured.

Matrix Factorization

Matrix Factorization (MF) is a collaborative filtering approach that predicts missing entries in a user-item interaction matrix (R) by decomposing it into two smaller matrices.

R\approx U. V^T

  • U: user latent factor matrix.
  • V: item latent factor matrix.
  • Each user and item is represented as a vector in a latent space.
  • The dot product U{_i ^T}V_j predicts the preference of user i for item j.

Key Concepts

1. Latent Factors

Latent factors represent hidden features of users and items.

  • Each user is represented by a vector U_i .
  • Each item is represented by a vector V_j ​.
  • Their dot product U_i^T V_j gives the predicted rating.

U_i \in \mathbb{R}^k, \quad V_j \in \mathbb{R}^k, \quad \hat{R}_{ij} = U_i^T V_j

Where,

  • k: number of latent dimensions (e.g., genre preferences for movies).
  • U_i ​: hidden representation of user i.
  • V_j ​: hidden representation of item j.
  • \hat R_{ij}: predicted rating.

2. Gaussian Priors

PMF assumes that latent factors are drawn from Gaussian (normal) distributions.

U \sim \mathcal{N}(0, \sigma_U^2 I), \quad V \sim \mathcal{N}(0, \sigma_V^2 I)

Where,

  • U: user latent factor matrix.
  • V: item latent factor matrix.
  • \sigma _U^2 , \sigma _V^2 ​: variance terms controlling spread.
  • This prior ensures regularization, keeping factor values from becoming too large.

3. Observed Ratings

Each rating is modeled as a Gaussian centered on the dot product of user and item vectors.

R_{ij} \sim \mathcal{N}(U_i^T V_j, \sigma_R^2)

Where,

  • R_{ij }: observed rating given by user i to item j.
  • U_i^T V_j : expected rating (mean of distribution).
  • \sigma _R^2 : variance (captures noise in user preferences).
  • This means ratings are not exact but have uncertainty.

4. Objective Function

The goal is to find latent factors that maximize likelihood (fit the observed ratings) while avoiding overfitting.

L = \sum_{(i,j) \in R} (R_{ij} - U_i^T V_j)^2 + \lambda_U \sum_i \|U_i\|^2 + \lambda_V \sum_j \|V_j\|^2

Where,

  • First term: squared error between actual and predicted ratings.
  • Second term: \lambda_U \Sigma_i ||U_i||^2 : regularization for users.
  • Third term: \lambda_V \Sigma_j||V_j||^2 : regularization for items.
  • \lambda_U,\lambda_V ​: control the strength of regularization.
  • Minimizing LLL balances accuracy (fit to data) and simplicity (avoid large values).

Implementation

We will implement PMF using Stochastic Gradient Descent (SGD) in NumPy.

Step 1: Import Library

We will import the necessary library such as NumPy.

Python
import numpy as np

Step 2: Define PMF Class

We will define the PMF model with parameters for ratings, latent dimensions, learning rate, regularization and training epochs.

Python
class PMF:
    def __init__(self, R, num_factors, learning_rate=0.01, reg_param=0.01, num_epochs=100):
        self.R = R
        self.num_users, self.num_items = R.shape
        self.num_factors = num_factors
        self.learning_rate = learning_rate
        self.reg_param = reg_param
        self.num_epochs = num_epochs
  • Initializes latent factors for users and items.
  • Uses Stochastic Gradient Descent to update latent factors based on observed ratings.
  • Computes Mean Squared Error (MSE) at the end of each epoch to track training progress.
Python
def train(self):
    self.U = np.random.normal(
        scale=1. / self.num_factors, size=(self.num_users, self.num_factors))
    self.V = np.random.normal(
        scale=1. / self.num_factors, size=(self.num_items, self.num_factors))

    for epoch in range(self.num_epochs):
        for i in range(self.num_users):
            for j in range(self.num_items):
                if self.R[i, j] > 0:
                    prediction = self.predict(i, j)
                    error = self.R[i, j] - prediction

                    self.U[i, :] += self.learning_rate * \
                        (error * self.V[j, :] - self.reg_param * self.U[i, :])
                    self.V[j, :] += self.learning_rate * \
                        (error * self.U[i, :] - self.reg_param * self.V[j, :])

        mse = self.compute_mse()
        print(f'Epoch: {epoch+1}, MSE: {mse}')
  • predict: estimates a single rating using the dot product of user and item vectors.
  • compute_mse: calculates error only on observed entries.
  • full_matrix: reconstructs the entire rating matrix with predicted values.
Python
def predict(self, i, j):
        return np.dot(self.U[i, :], self.V[j, :])

    def compute_mse(self):
        xs, ys = self.R.nonzero()
        predicted = self.full_matrix()
        error = 0
        for x, y in zip(xs, ys):
            error += (self.R[x, y] - predicted[x, y]) ** 2
        return np.sqrt(error)

    def full_matrix(self):
        return np.dot(self.U, self.V.T)

Step 3: Generate Synthetic Data and Train Model

We will

  • Creates synthetic latent factors for users and items to simulate a rating dataset.
  • Adds Gaussian noise to make the data realistic.
  • Masks some entries to simulate missing values.
  • Trains the PMF model with Stochastic Gradient Descent.
  • Reconstructs the full rating matrix, filling in missing values with predictions.
Python
np.random.seed(0)
num_users, num_items, num_factors = 100, 50, 10

U_true = np.random.normal(0, 1, (num_users, num_factors))
V_true = np.random.normal(0, 1, (num_items, num_factors))
R = np.dot(U_true, V_true.T) + np.random.normal(0, 0.1, (num_users, num_items))
mask = np.random.rand(*R.shape) < 0.8
R[~mask] = 0

pmf = PMF(R, num_factors=num_factors, learning_rate=0.01,
          reg_param=0.01, num_epochs=50)
pmf.train()

R_reconstructed = pmf.full_matrix()

Output:


Applications

  • Movie Recommendations: Predict user ratings for unseen movies (e.g., Netflix, Movielens).
  • E-commerce: Suggest products based on previous purchase history.
  • Music Streaming: Recommend songs/playlists (Spotify, Last.fm).
  • Social Media: Suggest friends, groups or content.
  • Online Learning: Recommend courses/resources tailored to learner profiles.

Limitations

  • Assumes linear interactions between latent factors.
  • Computationally expensive for large datasets.
  • Struggles with cold-start problem (new users/items with no history).
  • Hyperparameter tuning (learning rate, regularization, factors) can be tricky.
  • May converge to local minima if poorly initialized.
Comment