Hierarchical Classification

Hierarchical classification is a task in machine learning where the goal is to assign an instance to one or more classes organized in a hierarchy, rather than choosing from a flat label set. This structure can improve prediction accuracy and make outputs more interpretable.

Hierarchical classification assigns instances to labels that are part of a structured taxonomy, where labels may have parent-child relationships. Instead of treating categories as independent, it models the relationships among them to better reflect the data's semantics.

Types of Hierarchical Structures

1) Tree Hierarchy

Each node has exactly one parent (except the root).
Every instance is assigned a unique path from the root to a leaf.
Example: Animal → Mammal → Dog

2) DAG (Directed Acyclic Graph)

A node can have multiple parents.
Useful when concepts belong to multiple categories.
Example: "Tablet" can belong to both "Electronics" and "Computing Devices"

3) Taxonomy

A domain-specific organizational structure that can be a tree or DAG.
Adds semantic meaning to the labels (e.g., product taxonomy in retail, medical coding in healthcare).

Why Use Hierarchical Classification?

Aspect	Flat Classification	Hierarchical Classification
Output	Single Label	Label with hierarchy (e.g., path)
Error penalty	Equal for all errors	Penalizes mistakes at higher levels more
Interpretability	Moderate	High (provides structured output)

Use Cases and Applications

Medical diagnosis (ICD coding)
Product categorization in e-commerce
Document topic classification
Biological classification (taxonomy)
News categorization by topics and subtopics

Methods of Hierarchical Classification

1. Local Classifier per Node

A binary classifier is trained for each node to decide whether an instance belongs to that class.
Prediction proceeds top-down from the root.

2. Local Classifier per Parent Node

For each internal node, a multi-class classifier is trained to distinguish among its child nodes.
This reduces the number of classifiers but may increase complexity at each node.

3. Local Classifier per Level

One classifier per hierarchy level.
Useful when hierarchy is well-balanced.

4. Global Classifier

A single model is trained to consider the full hierarchy.
Often requires custom loss functions to enforce structural constraints.

5. Constraint-Based Models

Uses the hierarchy during inference (and optionally training) to enforce logical constraints.
Example: If a child node is predicted, all its ancestors must also be predicted.

Hierarchical Cross-Entropy Loss

To account for the hierarchical structure in the loss function, we can use hierarchical cross-entropy loss, which penalizes errors at higher levels more heavily:

L = -\sum_{i=1}^{N} \sum_{j \in \mathcal{A}(y_i)} \log P(j \mid x_i)

where:

N is the number of training samples,
y_i is the true label for instance x_i ,
\mathcal{A}(y_i) is the set of ancestors of y_i , including y_i itself.

Evaluation Metrics

Hierarchical Precision / Recall: Evaluate precision and recall at all levels of the hierarchy.
H-loss: Penalizes incorrect ancestor or descendant predictions.
Path Accuracy: Accuracy of the entire predicted path.

Tools and Libraries

scikit-multilearn for hierarchical multi-label classification
keras-han (for hierarchical attention networks)
Custom architectures using PyTorch or TensorFlow
Graph Neural Networks: To learn hierarchical embeddings over DAGs

Challenges

Data sparsity in deeper levels of hierarchy.
Error propagation in top-down models.
Scalability for large taxonomies.
Imbalanced data due to uneven class distribution.

Hierarchical Clustering in Data Mining
Hierarchical Clustering in Machine Learning
Classification of Plants
Difference between Hierarchical and Non Hierarchical Clustering
C++ Hierarchical Inheritance