Langchain Callbacks

LangChain callbacks are an event-driven system for monitoring, debugging and customizing the execution of your LLM applications, letting you "hook into" different stages like LLM calls, chain starts/ends or token generation to log data, stream responses, track costs or integrate with external tools.

langchain callbacks provide the primary mechanism for instrumenting, monitoring and understanding execution behavior.

Why Langchain Callbacks exist

Without langchain callbacks, the system is opaque, when something fails, slows down or retries, we can't pinpoint its origin or observe the issue ourselves, langchain callbacks exists to solve this problem.

Callbacks provide structured observability into the runtime behavior of a LangChain application, without modifying core application logic.

Working

All callback handlers inherit from a single Baseclass named langchain.core.callbacks.BaseCallbackHandler, this baseclass defines a set of optional hook methods for every model.

LangChain uses a publish–subscribe (observer) model.

The runtime publishes events
Callback handlers subscribe to those events
Handlers receive structured metadata but cannot alter execution

langchain_components — event-driven execution model of Callbacks

Implementation

Let's Implement a callback example to further our understanding

Step 1: Install required libraries

Python

!pip install langchain
!pip install -q --upgrade langchain-core langchain-community transformers accelerate

Step 2: Implementing Callback Class

The class inherits from langchain's callback base class, which allows it to overload its functions.
on_llm_start() is triggered right before the model starts.
on_llm_end() is triggered after the LLM execution cycle is completed.
on_llm_error() is triggered if any exception occurs during the execution of the program.

Python

from langchain_core.callbacks import BaseCallbackHandler
from typing import Any, Dict, List

class MyCallbackHandler(BaseCallbackHandler):
    def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str], **kwargs):
        print('===LLM start ===')
        print(f'Prompts are \n{prompts}')

    def on_llm_end(self, response, **kwargs):
        print("\n=== LLM END ===")

    def on_llm_error(self, error: Exception, **kwargs):
        print(f"\nLLM ERROR: {error}")

Step 3: Define the model's pipeline using Hugging face and transformers library

We load Qwen2.5 (1.5B parameter) model, this is a lightweight alternative and can run comfortably on a T4 GPU.
The tokenizer converts text into model-readable tokens and the model is loaded with device_map="auto" so it automatically uses available CPU/GPU resources.
The pipeline("text-generation") creates a simple interface for text generation, handling tokenization, inference and decoding.
HuggingFacePipeline() adapts the Hugging Face pipeline to LangChain’s LLM interface

Python

from langchain_community.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = "Qwen/Qwen2.5-1.5B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto"
)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=200,
)

llm = HuggingFacePipeline(
    pipeline=pipe,
    callbacks=[MyCallbackHandler()],
)

Step 4: Invoke the LLM with any query

Python

llm.invoke("hello how are you?")

Output :

In the given output both texts LLM start and LLM end were a result of the callback function firing when llm started and ended.

Advantages

Reduces Blackbox execution : Callbacks let us see what's happening while the model is running, this visiblity makes debugging less frustrating.
No interference with model flow : Callbacks only observe. You can add or remove them without worrying about prompts breaking or system flow getting altered.
Modular in nature : Same callback can be used for LLM, a chain, agent, tool. this makes the code reusable and modular.

Limitation

Cannot control execution : Callbacks cannot stop a run, modify outputs or change prompts, this makes them good for logging and observing but not custom logic to handle situations.
Can clutter logs : Adding too many callbacks or logging too much information can make outputs noisy and harder to interpret, especially in large systems.
Complexity & Overhead : for small projects, callbacks are not necessary and may add extra overhead and complexity.

Why Langchain Callbacks exist

Working

Implementation

Advantages

Limitation

Explore