Orthogonal distance regression using SciPy

Orthogonal Distance Regression (ODR) is a powerful statistical technique used to fit a model to data when both independent (X) and dependent (Y) variables are subject to error. Unlike traditional Ordinary Least Squares (OLS), which assumes that only the dependent variable has measurement errors, ODR accounts for errors in both directions, making it ideal for scientific and engineering data where all measurements can be noisy.

Why Use ODR Instead of OLS?

In many real-world scenarios, both the independent variable (X) and the dependent variable (Y) may be affected by measurement errors. In such cases, ODR becomes more suitable because it:

Accounts for errors in both X and Y
Provides a more geometrically accurate fit
Is capable of handling non-linear models

Mathematical Formulation

The objective function minimized in ODR is:

\sum_{i=1}^{n} \left[ \frac{(y_i - \alpha - \beta x_i)^2}{\eta} + (x_i - X_i)^2 \right]

Where:

𝑦𝑖: observed dependent variable
𝑥𝑖: true (unknown) value of the independent variable
𝑋𝑖: observed value of the independent variable
\alpha,\beta: regression coefficients (intercept and slope)
\eta : weighting factor between Y and X errors

And the weighting factor \eta is defined as:

\eta = \frac{\sigma_\xi^2}{\sigma_\mu^2}

Where:

\sigma_\xi^2: variance of error in the dependent variable (Y-axis)
\sigma_\mu^2: variance of error in the independent variable (X-axis)

Implementation in SciPy

SciPy provides the scipy.odr module to implement ODR using the ODRPACK library, a well-established FORTRAN-77 based package. SciPy wraps this functionality in an object-oriented interface for ease of use.

Step-by-Step Approach

Import required libraries
Create input data arrays (feature, target)
Define a model function (e.g., linear)
Use odr.Model() to wrap the model function
Wrap data using odr.Data()
Create and configure odr.ODR() instance
Run the regression using .run()
Display results with .pprint()

Python

import numpy as np
import matplotlib.pyplot as plt
from scipy import odr  

x = np.arange(1, 11)
np.random.shuffle(x)
y = np.array([0.65, -0.75, 0.90, -0.5, 0.14,
              0.84, 0.99, -0.95, 0.41, -0.28])

def model_fn(p, x):
    m, c = p
    return m * x + c

model = odr.Model(model_fn)
data = odr.Data(x, y)
odr_run = odr.ODR(data, model, beta0=[0.2, 1.0])
res = odr_run.run()
res.pprint()

Output

Beta: [ 0.11545417 -0.48999795]
Beta Std Error: [0.07475684 0.46382517]
Beta Covariance: [[ 0.01228991 -0.06759452]
[-0.06759452 0.4731028 ]]
Residual Variance: 0.45472947791705537
Inverse Condition #: 0.06923218954368635
Reason(s) for Halting:
Sum of squares convergence

Linear Regression using Scikit-Learn
Robust Regression Techniques (RANSAC, Theil-Sen)
Non-Linear Regression with SciPy’s curve_fit

Orthogonal distance regression using SciPy

Why Use ODR Instead of OLS?

Mathematical Formulation

Implementation in SciPy

Step-by-Step Approach

Related articles

Explore