A distance matrix contains the distances computed pairwise between the vectors of matrix/ matrices. scipy.spatial package provides us distance_matrix() method to compute the distance matrix. Generally matrices are in the form of 2-D array and the vectors of the matrix are matrix rows ( 1-D array).
Example:
from scipy.spatial import distance_matrix
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
res = distance_matrix(A, B)
print(res)
Output
[[5.65685425 8.48528137]
[2.82842712 5.65685425]]
Explanation: This computes the Euclidean distance (default, p =2) between each pair of points in A and B. For instance, the first element 5.65685425 is the distance between [1, 2] and [5, 6].
Syntax of scipy.spatial.distance_matrix()
scipy.spatial.distance_matrix(XA, XB, p=2)
Parameters:
- XA (array_like): An m x n array of m points in n-dimensional space.
- XB (array_like): A k x n array of k points in the same n-dimensional space.
- p (float, optional): The Minkowski distance metric to use. Default is 2 (Euclidean distance).
Returns: distances (ndarray) – An m × k matrix where each element [i, j] represents the distance between XA[i] and XB[j].
Raises:
- ValueError: If input arrays have incompatible shapes or dimensions.
- TypeError: If input is not array-like or if the p parameter is not valid.
Note: The number of columns (dimensions) in both XA and XB must be the same. You can use different values of p to compute various distance metrics:
- p = 1 → Manhattan Distance
- p = 2 → Euclidean Distance (default)
- p = ∞ → Chebyshev Distance
Examples
Example 1: Manhattan Distance (p =1)
from scipy.spatial import distance_matrix
import numpy as np
A = np.array([[1, 2]])
B = np.array([[3, 4], [5, 6]])
res= distance_matrix(A, B, p=1)
print(res)
Output
[[4. 8.]]
Explanation: Here, the Manhattan distance is used. The first value 4 is computed as |1-3| + |2-4| = 2 + 2.
Example 2: Chebyshev Distance (p =∞)
from scipy.spatial import distance_matrix
import numpy as np
A = np.array([[1, 2]])
B = np.array([[4, 6], [7, 3]])
res = distance_matrix(A, B, p=np.inf)
print(res)
Output
[[4. 6.]]
Explanation: The Chebyshev distance takes the maximum absolute difference in any dimension. For [1, 2] and [4, 6], it's max(|1-4|, |2-6|) = max(3, 4) = 4.
Example 3: Custom Distance (p =3)
from scipy.spatial import distance_matrix
import numpy as np
A = np.array([[1, 2]])
B = np.array([[4, 6]])
res = distance_matrix(A, B, p=3)
print(res)
Output
[[4.49794145]]
Explanation: This uses Minkowski distance with p=3, which gives a value between the Manhattan and Euclidean distances.
Using scipy.spatial.distance.cdist()
While distance_matrix() is useful for computing pairwise distances using the Minkowski metric, cdist() from scipy.spatial.distance provides greater flexibility by supporting a wide range of distance metrics (e.g., cosine, correlation, cityblock, etc.)
Syntax
scipy.spatial.distance.cdist(XA, XB, metric='euclidean')
Parameters:
- XA (array_like): An m × n array of m points.
- XB (array_like): A k × n array of k points.
- metric (str or function): Distance metric to use (e.g., 'euclidean', 'cosine', 'cityblock', etc.).
Returns: distances (ndarray): An m × k matrix of distances.
Examples
Example 1: Cosine distance
from scipy.spatial.distance import cdist
import numpy as np
A = np.array([[1, 0], [0, 1]])
B = np.array([[1, 1]])
res = cdist(A, B, metric='cosine')
print(res)
Output
[[0.29289322]
[0.29289322]]
Explanation: Cosine distance measures the angular difference between vectors. Both vectors in A form a 45° angle with the vector in B, hence the equal distances.
Example 2: Cityblock (Manhattan) Distance
from scipy.spatial.distance import cdist
import numpy as np
A = np.array([[1, 2]])
B = np.array([[4, 6], [5, 1]])
res = cdist(A, B, metric='cityblock')
print(res)
Output
[[7. 5.]]
Explanation: This uses the cityblock (Manhattan) metric. For [1, 2] and [4, 6], distance = |1−4| + |2−6| = 3 + 4 = 7.