Zero Shot Classification

Traditional machine learning models require labeled examples for every class they need to classify. However, in many real-world scenarios, collecting labeled data for every possible category is impractical. Zero- shot classification (ZSC) is a technique that allows models to classify data into categories without having seen labeled examples of those categories during training.

This is achieved using semantic knowledge transfer, where the model leverages prior knowledge, typically through:

Word embeddings (e.g., Word2Vec, GloVe)
Pre-trained language models (e.g., GPT, BERT, CLIP)
Ontologies and knowledge graphs

Example

Consider a classifier trained only on cats and dogs. If asked to classify a "lion," a traditional classifier would fail. However, a Zero-Shot model could recognize that a lion is semantically similar to a cat and classify it accordingly.

How Does Zero-Shot Learning Work?

Zero-shot classification is typically based on:

1. Semantic Embeddings

Each class is represented by a vector embedding that captures its meaning.
This can be derived from text descriptions, word embeddings, or knowledge graphs.

2. Similarity Matching

When a new instance arrives, the model compares its representation with the available class embeddings.
The class with the highest similarity score is selected.

Types of Zero-Shot Learning:

Transductive Zero-Shot Learning (TZSL): Uses unlabeled test data distribution for classification.
Inductive Zero-Shot Learning (IZSL): Relies solely on training data without accessing test distributions.

Mathematical Formulation

Let:

X be the feature space.
Y_{\text{train}} be the set of seen classes.
Y_{\text{test}} be the set of unseen classes, where Y_{\text{train}} \cap Y_{\text{test}} = \emptyset

Given a new instance x \in X , the model assigns it to a class y \in Y_{\text{test}} using a semantic similarity function S(x, y) , such that:

\hat{y} = \arg\max_{y \in Y_{\text{test}}} S(f(x), g(y))

where:

f(x) maps an instance to an embedding space.
g(y) maps class descriptions to the same embedding space.

The Cosine Similarity function is often used:

S(x, y) = \frac{f(x) \cdot g(y)}{\| f(x) \| \| g(y) \|}

Implementation of Zero Shot Classification

Below is a Python implementation of Zero-Shot Classification using Hugging Face’s transformers library and the BART Zero-Shot model.

Python

from transformers import pipeline

# Load pre-trained zero-shot classifier
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

# Define the text to classify
text = "The patient is experiencing severe headaches and nausea."

# Define candidate labels (new unseen categories)
candidate_labels = ["Neurology", "Cardiology", "Psychiatry"]

# Perform Zero-Shot Classification
result = classifier(text, candidate_labels)

print("Classification Results:")
for label, score in zip(result["labels"], result["scores"]):
    print(f"{label}: {score:.4f}")

Output:

Applications of Zero-Shot Classification

1. Natural Language Processing (NLP):

Sentiment analysis on unseen topics
Document classification for new categories

2. Computer Vision:

Identifying unseen objects (e.g., using OpenAI’s CLIP model)
Autonomous vehicle detection for new road objects

3. Healthcare & Bioinformatics:

Diagnosing rare diseases based on symptoms
Predicting new protein structures

4. Cybersecurity:

Detecting unknown malware patterns

Advantages Zero Shot Classification

No labeled data required for unseen classes
Generalization to new tasks
Cost-effective compared to supervised learning

Challenges Zero Shot Classification

Dependence on quality embeddings
Domain shift problems (performance drops when test data differs from training)
Class imbalance issues

Related Articles:

Zero-Shot Text Classification using HuggingFace Model
Zero-Shot Learning for Novel Class Recognition using CLIP Model
Zero-Shot vs One-Shot vs Few-Shot Learning
Zero Shot Learning in Deep Learning
Few-shot learning in Machine Learning