Skip to content

feat: Implement Principal Component Analysis (PCA) #12596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
feat: Implement Principal Component Analysis (PCA)
- Added PCA implementation with dataset standardization.
- Used Singular Value Decomposition (SVD) for computing principal components.
- Fixed import sorting to comply with PEP 8 (Ruff I001).
- Ensured type hints and docstrings for better readability.
- Added doctests to validate correctness.
- Passed all Ruff checks and automated tests.
  • Loading branch information
parikshit2111 committed Mar 2, 2025
commit fa9cd031c4b614ec1ab66c7970f592659b9952bf
2 changes: 2 additions & 0 deletions DIRECTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,7 @@
* [Minimum Tickets Cost](dynamic_programming/minimum_tickets_cost.py)
* [Optimal Binary Search Tree](dynamic_programming/optimal_binary_search_tree.py)
* [Palindrome Partitioning](dynamic_programming/palindrome_partitioning.py)
* [Range Sum Query](dynamic_programming/range_sum_query.py)
* [Regex Match](dynamic_programming/regex_match.py)
* [Rod Cutting](dynamic_programming/rod_cutting.py)
* [Smith Waterman](dynamic_programming/smith_waterman.py)
Expand Down Expand Up @@ -608,6 +609,7 @@
* [Mfcc](machine_learning/mfcc.py)
* [Multilayer Perceptron Classifier](machine_learning/multilayer_perceptron_classifier.py)
* [Polynomial Regression](machine_learning/polynomial_regression.py)
* [Principle Component Analysis](machine_learning/principle_component_analysis.py)
* [Scoring Functions](machine_learning/scoring_functions.py)
* [Self Organizing Map](machine_learning/self_organizing_map.py)
* [Sequential Minimum Optimization](machine_learning/sequential_minimum_optimization.py)
Expand Down
85 changes: 85 additions & 0 deletions machine_learning/principle_component_analysis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
"""
Principal Component Analysis (PCA) is a dimensionality reduction technique
used in machine learning. It transforms high-dimensional data into a lower-dimensional
representation while retaining as much variance as possible.

This implementation follows best practices, including:
- Standardizing the dataset.
- Computing principal components using Singular Value Decomposition (SVD).
- Returning transformed data and explained variance ratio.
"""

import doctest

import numpy as np
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler


def collect_dataset() -> tuple[np.ndarray, np.ndarray]:
"""
Collects the dataset (Iris dataset) and returns feature matrix and target values.

:return: Tuple containing feature matrix (X) and target labels (y)

Example:
>>> X, y = collect_dataset()
>>> X.shape
(150, 4)
>>> y.shape
(150,)
"""
data = load_iris()
return np.array(data.data), np.array(data.target)


def apply_pca(data_x: np.ndarray, n_components: int) -> tuple[np.ndarray, np.ndarray]:
"""
Applies Principal Component Analysis (PCA) to reduce dimensionality.

:param data_x: Original dataset (features)
:param n_components: Number of principal components to retain
:return: Tuple containing transformed dataset and explained variance ratio

Example:
>>> X, _ = collect_dataset()
>>> transformed_X, variance = apply_pca(X, 2)
>>> transformed_X.shape
(150, 2)
>>> len(variance) == 2
True
"""
# Standardizing the dataset
scaler = StandardScaler()
data_x_scaled = scaler.fit_transform(data_x)

# Applying PCA
pca = PCA(n_components=n_components)
principal_components = pca.fit_transform(data_x_scaled)

return principal_components, pca.explained_variance_ratio_


def main() -> None:
"""
Driver function to execute PCA and display results.
"""
data_x, data_y = collect_dataset()

# Number of principal components to retain
n_components = 2

# Apply PCA
transformed_data, variance_ratio = apply_pca(data_x, n_components)

print("Transformed Dataset (First 5 rows):")
print(transformed_data[:5])

print("\nExplained Variance Ratio:")
print(variance_ratio)


if __name__ == "__main__":
doctest.testmod()
main()