Poisson Regression

Last Updated : 23 Jul, 2025

Poisson regression is a statistical technique used to model and analyze count data, where the outcome variable represents the number of times an event occurs in a fixed interval of time, space, or any other dimension. It is most appropriate when the values of the response variable are non-negative whole numbers (0, 1, 2, ...) and the average rate at which events occur is constant.

Poisson regression is used when you want to predict things that are counts, like "how many" or "how much," and these counts can't be negative.

  • Number of customers arriving at a store
  • Number of clicks on a website
  • Number of defects in a batch

Key Assumptions of Poisson Regression

  • The response variable is a count such as number of visits, accidents, or purchases.
  • Counts follow a Poisson distribution.
  • The mean and variance of the distribution are equal.
  • Observations are independent of each other.
  • Events occur at a constant average rate.

Mathematical Formulation of Poisson Regression

In Poisson regression, the output Y is assumed to follow a Poisson distribution:

P(Y = y) = \frac{e^{-\lambda} \lambda^{y}}{y!}

Where:

  • is the count variable
  • y is a particular count
  • lambda is the expected rate of occurrence
  • e is Euler’s number (approximately 2.718)

Instead of modeling Y directly, we model the log of the expected value:

\log(\lambda) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_k x_k

Or, in exponential form: \lambda = e^{\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k}

Where:

  • \lambda: The expected count
  • X_i​: Independent variables
  • \beta_i: Coefficients to be learned

When to Use Poisson Regression

Poisson Regression is appropriate when:

  • The dependent variable is a count (e.g., 0, 1, 2, …)
  • Counts are not negative.
  • The counts follow a Poisson distribution (i.e., mean ≈ variance).
  • The observations are independent.

Implementation of Poisson Regression in Python

Step 1: Import Required Libraries

We start by importing NumPy for data, Statsmodels for building the Poisson regression model, and Matplotlib for plotting.

Python
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

Step 2: Create Sample Data

We generate input values (x) and simulate corresponding count data (y) that follow a Poisson distribution, where counts increase with x.

Python
np.random.seed(42)
x = np.linspace(0, 10, 100)
X = sm.add_constant(x)
lambda_ = np.exp(0.5 + 0.3 * x)
y = np.random.poisson(lambda_)

Step 3: Fit the Poisson Regression Model

We use the GLM function with a Poisson family to build and fit the model.

Python
model = sm.GLM(y, X, family=sm.families.Poisson())
results = model.fit()

Step 4: View Model Summary

This gives us details about the model, including the learned coefficients and model performance.

Python
print(results.summary())

Output:

Generalized_linear_model_regression_result
Generalized Linear Model Regression Results

Step 5: Predict and Plot the Results

We use the model to predict counts and then plot the actual data vs. the fitted curve.

Python
y_pred = results.predict(X)
plt.scatter(x, y, color='orange', label='Observed')
plt.plot(x, y_pred, color='red', label='Poisson Fit')
plt.xlabel('x')
plt.ylabel('Count (y)')
plt.title('Poisson Regression')
plt.legend()
plt.show()

Output:

Poisson_Regression
Poisson Regression

Poisson Regression vs Linear Regression

Feature

Linear Regression

Poisson Regression

Output

Continuous values

Non-negative counts

Assumption

Normal distribution

Poisson distribution

Link Function

Identity

Log

Use Cases

Sales, prices, temperature

Count events, incidents

Real-World Applications of Poisson Regression

Poisson regression is widely used in domains where the outcome is a count of events over time, space, or groups. Below are some practical use cases:

  • Healthcare: Estimating the number of patient admissions to a hospital per day or the number of new disease cases reported in a region each month.
  • Transportation: Predicting the number of traffic accidents at a particular intersection per week.
  • Customer Support: Analyzing the number of customer service calls received by a company daily.
  • E-commerce and Marketing: Modeling the number of clicks on a digital advertisement or the number of purchases made by a user in a given time.
  • Sports Analytics: Forecasting the number of goals scored by a team in a match or a player’s number of successful passes.
  • Website Analytics: Measuring the number of page visits or downloads occurring on a website per hour.
  • Insurance: Estimating the number of claims filed in a given policy period based on customer characteristics.

Related Articles

Comment