Feature Selection | Embedded methods

Last Updated : 23 Jul, 2025

In machine learning, having too many features (also called variables or columns) can lead to complex models that are hard to understand and may not perform well. Feature selection helps us choose only the most important features, making models faster, simpler, and often more accurate.

There are three main types of feature selection methods:

  1. Filter methods
  2. Wrapper methods
  3. Embedded methods

What Are Embedded Methods?

Embedded methods combine the best parts of filter and wrapper methods. They choose important features as the model is being trained. This makes them faster than wrapper methods and often more accurate than filter methods.

These methods are usually part of the learning algorithm itself. Examples include decision trees, regularization methods like Lasso, and some types of linear models.

Why Use Embedded Methods?

  • Built-in feature selection: No separate process is needed
  • Efficient: Work during training
  • Better accuracy: Consider feature importance with the model
  • Fewer irrelevant features: Helps avoid overfitting

Common Embedded Methods

Let’s look at the most popular embedded methods used in machine learning.

1. Lasso Regression (L1 Regularization)

Lasso stands for Least Absolute Shrinkage and Selection Operator. It is a type of linear regression that uses L1 regularization, which can shrink some feature weights to zero. When a feature’s weight becomes zero, the model ignores it.

Formula:

\text{Loss} = \text{MSE} + \lambda \sum_{j=1}^{n} |w_j|

Where:

  • MSE is Mean Squared Error
  • wj is the weight of feature
  • \lambda is a tuning parameter (regularization strength)

When \lambda is high, more weights become zero.

Python Code Example:

  • Import libraries: Load Lasso, the California housing dataset, and utilities for splitting data and handling DataFrames.
  • Load dataset: X contains features like house age, income, etc. y is the median house value.
  • Split data: 70% training, 30% testing using train_test_split.
  • Train Lasso model: Lasso regression with alpha=0.1 is trained on the data. Lasso automatically reduces coefficients of less important features to zero.
  • Select important features: Features with non-zero coefficients are considered selected. The code prints them.
Python
from sklearn.linear_model import Lasso
from sklearn.datasets import fetch_california_housing # Import California housing dataset
from sklearn.model_selection import train_test_split
import pandas as pd

california_housing = fetch_california_housing() # Load California housing dataset
X = pd.DataFrame(california_housing.data, columns=california_housing.feature_names)
y = california_housing.target # Use the target from the new dataset

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Fit Lasso model
model = Lasso(alpha=0.1)
model.fit(X_train, y_train)

# Check selected features
selected_features = X.columns[model.coef_ != 0]
print("Selected Features:", selected_features.tolist())

Output

Screenshot-from-2025-05-29-11-07-49
Output

2. Ridge vs Lasso vs ElasticNet

ElasticNet Formula:

\text{Loss} = \text{MSE} + \lambda_1 \sum |w_j| + \lambda_2 \sum w_j^2

Useful when there are many correlated features.

3. Decision Trees and Tree-Based Models

Tree-based models like Decision Trees, Random Forests, and Gradient Boosting automatically rank features by importance.

How It Works:

  • At each split, the model chooses the best feature.
  • The more a feature is used, the more important it is.

4. Regularized Logistic Regression

Just like Lasso works for linear regression, it also works for classification using logistic regression.

Formula:

\text{Loss} = -\text{log-likelihood} + \lambda \sum |w_j|

Used for binary classification with automatic feature selection.

5. Support Vector Machine (SVM) with L1 Penalty

SVMs can also be used with L1 regularization to remove irrelevant features. This is called L1-SVM. It's more advanced but helpful when features are many and irrelevant ones need to be removed.

Advantages of Embedded Methods

  • Combine training and selection in one step
  • Faster than wrapper methods
  • Often more accurate than filter methods
  • Help avoid overfitting by removing useless features

Limitations of Embedded Methods

  • Model-dependent: Only work with certain algorithms
  • Hard to interpret for black-box models like boosting or ensemble trees
  • Regularization strength () needs tuning
Comment