Traditional machine learning models require labeled examples for every class they need to classify. However, in many real-world scenarios, collecting labeled data for every possible category is impractical. Zero- shot classification (ZSC) is a technique that allows models to classify data into categories without having seen labeled examples of those categories during training.
This is achieved using semantic knowledge transfer, where the model leverages prior knowledge, typically through:
- Word embeddings (e.g., Word2Vec, GloVe)
- Pre-trained language models (e.g., GPT, BERT, CLIP)
- Ontologies and knowledge graphs
Example
Consider a classifier trained only on cats and dogs. If asked to classify a "lion," a traditional classifier would fail. However, a Zero-Shot model could recognize that a lion is semantically similar to a cat and classify it accordingly.
How Does Zero-Shot Learning Work?
Zero-shot classification is typically based on:
1. Semantic Embeddings
- Each class is represented by a vector embedding that captures its meaning.
- This can be derived from text descriptions, word embeddings, or knowledge graphs.
2. Similarity Matching
- When a new instance arrives, the model compares its representation with the available class embeddings.
- The class with the highest similarity score is selected.
Types of Zero-Shot Learning:
- Transductive Zero-Shot Learning (TZSL): Uses unlabeled test data distribution for classification.
- Inductive Zero-Shot Learning (IZSL): Relies solely on training data without accessing test distributions.
Mathematical Formulation
Let:
X be the feature space.Y_{\text{train}} be the set of seen classes.Y_{\text{test}} be the set of unseen classes, whereY_{\text{train}} \cap Y_{\text{test}} = \emptyset
Given a new instance
\hat{y} = \arg\max_{y \in Y_{\text{test}}} S(f(x), g(y))
where:
f(x) maps an instance to an embedding space.g(y) maps class descriptions to the same embedding space.
The Cosine Similarity function is often used:
S(x, y) = \frac{f(x) \cdot g(y)}{\| f(x) \| \| g(y) \|}
Implementation of Zero Shot Classification
Below is a Python implementation of Zero-Shot Classification using Hugging Face’s transformers library and the BART Zero-Shot model.
from transformers import pipeline
# Load pre-trained zero-shot classifier
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
# Define the text to classify
text = "The patient is experiencing severe headaches and nausea."
# Define candidate labels (new unseen categories)
candidate_labels = ["Neurology", "Cardiology", "Psychiatry"]
# Perform Zero-Shot Classification
result = classifier(text, candidate_labels)
print("Classification Results:")
for label, score in zip(result["labels"], result["scores"]):
print(f"{label}: {score:.4f}")
Output:

Applications of Zero-Shot Classification
1. Natural Language Processing (NLP):
- Sentiment analysis on unseen topics
- Document classification for new categories
2. Computer Vision:
- Identifying unseen objects (e.g., using OpenAI’s CLIP model)
- Autonomous vehicle detection for new road objects
3. Healthcare & Bioinformatics:
- Diagnosing rare diseases based on symptoms
- Predicting new protein structures
4. Cybersecurity:
- Detecting unknown malware patterns
Advantages Zero Shot Classification
- No labeled data required for unseen classes
- Generalization to new tasks
- Cost-effective compared to supervised learning
Challenges Zero Shot Classification
- Dependence on quality embeddings
- Domain shift problems (performance drops when test data differs from training)
- Class imbalance issues
Related Articles: