Pearson’s Correlation Coefficient is one of the most widely used statistical measures for determining the strength and direction of the relationship between two variables. Also known as the Product Moment Correlation, it measures how closely two variables move together. It is represented by r, a dimensionless value ranging from −1 to +1, where +1 indicates a perfect positive correlation, −1 indicates a perfect negative correlation and 0 represents no linear relationship between the variables.
Formula
Karl~Pearson's~Coefficient~of~Correlation =\frac{Sum~of~Products~of~Deviations~from~their~respective~means}{Number~of~Pairs\times{Standard~Deviations~of~both~Series}}
Or
Where:
- N = Number of Pair of Observations
- x = Deviation of X series from Mean
(X-\bar{X}) - y = Deviation of Y series from Mean
(Y-\bar{Y}) \sigma_x = Standard Deviation of X series(\sqrt{\frac{\sum{x^2}}{N}}) \sigma_y = Standard Deviation of Y series(\sqrt{\frac{\sum{y^2}}{N}}) - r = Coefficient of Correlation
Example of Using Pearson’s Correlation
| X | 12 | 16 | 20 | 24 | 28 | 32 | 36 |
|---|---|---|---|---|---|---|---|
| Y | 6 | 9 | 12 | 15 | 18 | 21 | 24 |
Where:
- N=7
- ∑xy=336
- σx=8
- σy=6
The value r=1 indicates a perfect positive correlation, meaning both variables increase proportionally together.
Methods of Calculating Karl Pearson's Coefficient of Correlation
- Actual Mean Method
- Direct Method
- Short-Cut Method/Assumed Mean Method/Indirect Method
- Step-Deviation Method
1. Actual Mean Method
This method calculates correlation using deviations from the actual means of both series.
Formula:
r=\frac{\sum{xy}}{\sqrt{\sum{x^2}\times{\sum{y^2}}}}
2. Direct Method
The Direct Method calculates correlation using the original values of the series without finding deviations separately.
Formula:
r=\frac{N\sum{XY}-\sum{X}.\sum{Y}}{\sqrt{N\sum{X^2}-(\sum{X})^2}{\sqrt{N\sum{Y^2}-(\sum{Y})^2}}}
3. Short-Cut Method/Assumed Mean Method
This method simplifies calculations by taking deviations from assumed means instead of actual means.
Formula:
r=\frac{N\sum{dxdy}-\sum{dx}.\sum{dy}}{\sqrt{N\sum{dx^2}-(\sum{dx})^2}{\sqrt{N\sum{dy^2}-(\sum{dy})^2}}}
- ∑dx = Sum of deviations of X values from assumed mean
- ∑dy = Sum of deviations of Y values from assumed mean
4. Step Deviation Method
The Step Deviation Method further simplifies calculations by taking deviations from an assumed mean and dividing them by a common factor CCC.
Formula:
r=\frac{N\sum{dx^\prime{dy^\prime}}-\sum{dx^\prime}.\sum{dy^\prime}}{\sqrt{N\sum{dx^\prime{^2}}-(\sum{dx^\prime})^2}{\sqrt{N\sum{dy^\prime{^2}}-(\sum{dy^\prime})^2}}}
∑dx′ = Sum of deviations of X values from assumed mean∑dy′ = Sum of deviations of Y values from assumed mean
Python Implementation
- Here we will use NumPy for numerical computations, while np.array() stores the values of X and Y as arrays.
- The code calculates the mean, deviations from the mean and standard deviations for both variables using NumPy functions.
- np.corrcoef() calculates the Pearson correlation coefficient between two variables. The value ranges from −1 to +1, representing negative, no or positive correlation.
import numpy as np
# Sample data
X = np.array([12, 16, 20, 24, 28, 32, 36])
Y = np.array([6, 9, 12, 15, 18, 21, 24])
# Mean of X and Y
mean_x = np.mean(X)
mean_y = np.mean(Y)
# Deviations from mean
x = X - mean_x
y = Y - mean_y
# Standard deviations
sigma_x = np.sqrt(np.sum(x**2) / len(X))
sigma_y = np.sqrt(np.sum(y**2) / len(Y))
# Pearson correlation coefficient
r = np.sum(x * y) / (len(X) * sigma_x * sigma_y)
print("Pearson Correlation Coefficient:", r)
Output:
Pearson Correlation Coefficient: 1.0