Yellowbrick is a Python visualization library built on top of Scikit-learn that helps make machine learning models easier to understand. It provides visual tools for analyzing data, evaluating model performance and comparing different machine learning models.
- Helps detect issues such as overfitting and underfitting.
- Supports feature analysis and model comparison.
- Simplifies the interpretation of machine learning results.
Visualizations
Yellowbrick provides a variety of visualization tools that help users better understand and evaluate machine learning models. These visualizations make it easier to analyze model performance, identify issues and gain insights from data.
1. Classification
- Visualizes model performance using confusion matrices and ROC curves.
- Helps evaluate metrics such as precision, recall and F1-score.
- Useful for comparing classification models.
2. Regression
- Uses residual and prediction error plots to assess model performance.
- Helps identify bias, variance and outliers in the data.
- Useful for evaluating regression models.
3. Clustering
- Provides tools such as silhouette plots and the elbow method.
- Helps determine the optimal number of clusters.
- Assists in evaluating cluster quality and separation.
4. Feature Analysis
- Visualizes feature importance and feature relationships.
- Helps identify the most influential features in a dataset.
- Useful for feature selection and dimensionality analysis.
Classification Visualizations
1. Installing Yellowbrick
Yellowbrick can be easily installed using pip and integrates seamlessly with Scikit-learn models.
pip install yellowbrick
2. Import required libraries
Import the necessary libraries for data loading, model training and visualization.
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from yellowbrick.classifier import (
ConfusionMatrix,
ROCAUC,
ClassificationReport
)
3. Load and Split the Dataset
Load the dataset and divide it into training and testing sets.
- Loads the Breast Cancer dataset.
- Separates features and target values.
- Splits the data into training and testing sets.
data = load_breast_cancer()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42
)
4. Confusion Matrix
A confusion matrix shows the number of correct and incorrect predictions for each class.
model = RandomForestClassifier(random_state=42)
visualizer = ConfusionMatrix(model)
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.show()
Output:

5. ROC-AUC Curve
The ROC-AUC curve evaluates how effectively a model distinguishes between classes.
model = RandomForestClassifier(random_state=42)
visualizer = ROCAUC(model)
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.show()
Output:

6. Classification Report
Displays precision, recall, F1-score and support for each class.
model = RandomForestClassifier(random_state=42)
visualizer = ClassificationReport(model)
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.show()
Output:

Regression Visualizations
1. Import required libraries
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from yellowbrick.regressor import (
ResidualsPlot,
PredictionError
)
2. Load and Split Regression Dataset
- Loads the California Housing dataset.
- Splits the data for training and testing.
housing = fetch_california_housing()
X = housing.data
y = housing.target
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42
)
3. Residuals Plot
- Visualizes prediction errors.
- Useful for regression analysis.
model = LinearRegression()
visualizer = ResidualsPlot(model)
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.show()
Output:

4. Prediction Error Plot
- Compares actual and predicted values.
- Helps evaluate model accuracy.
- Indicates overall regression performance.

Clustering Visualizations
1. Import required libraries
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
from yellowbrick.cluster import (
KElbowVisualizer,
SilhouetteVisualizer
)
2. Load Clustering Dataset
iris = load_iris()
X = iris.data
3. Elbow Plot
- Tests multiple cluster values.
- Identifies the optimal number of clusters.
- Uses the elbow method.
model = KMeans(random_state=42)
visualizer = KElbowVisualizer(
model,
k=(2,10)
)
visualizer.fit(X)
visualizer.show()
Output:

4. Silhouette Plot
- Evaluates cluster separation.
- Measures clustering performance.
- Helps validate clustering results.
model = KMeans(
n_clusters=3,
random_state=42
)
visualizer = SilhouetteVisualizer(model)
visualizer.fit(X)
visualizer.show()
Output:

Feature Analysis
1. Rank2D Visualization
- Displays feature relationships.
- Highlights correlations between variables.
- Useful for feature analysis.
from yellowbrick.features import Rank2D
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
visualizer = Rank2D()
visualizer.fit(X, y)
visualizer.transform(X)
visualizer.show()
Output:

2. Parallel Coordinates Plot
- Visualizes multiple features simultaneously.
- Helps identify patterns and class separation.
- Useful for high dimensional datasets.
from sklearn.datasets import load_iris
from yellowbrick.features import ParallelCoordinates
iris = load_iris()
X = iris.data
y = iris.target
visualizer = ParallelCoordinates(
classes=['Setosa', 'Versicolor', 'Virginica']
)
visualizer.fit(X, y)
visualizer.transform(X)
visualizer.show()
Output:

Download full code from here