Langchain Callbacks

Last Updated : 13 Feb, 2026

LangChain callbacks are an event-driven system for monitoring, debugging and customizing the execution of your LLM applications, letting you "hook into" different stages like LLM calls, chain starts/ends or token generation to log data, stream responses, track costs or integrate with external tools.

langchain callbacks provide the primary mechanism for instrumenting, monitoring and understanding execution behavior.

Why Langchain Callbacks exist

Without langchain callbacks, the system is opaque, when something fails, slows down or retries, we can't pinpoint its origin or observe the issue ourselves, langchain callbacks exists to solve this problem.

Callbacks provide structured observability into the runtime behavior of a LangChain application, without modifying core application logic.

callbacks
Callback flow with v/s without

Working

All callback handlers inherit from a single Baseclass named langchain.core.callbacks.BaseCallbackHandler, this baseclass defines a set of optional hook methods for every model.

LangChain uses a publish–subscribe (observer) model.

  • The runtime publishes events
  • Callback handlers subscribe to those events
  • Handlers receive structured metadata but cannot alter execution
langchain_components
event-driven execution model of Callbacks

Implementation

Let's Implement a callback example to further our understanding

Step 1: Install required libraries

Python
!pip install langchain
!pip install -q --upgrade langchain-core langchain-community transformers accelerate

Step 2: Implementing Callback Class

  • The class inherits from langchain's callback base class, which allows it to overload its functions.
  • on_llm_start() is triggered right before the model starts.
  • on_llm_end() is triggered after the LLM execution cycle is completed.
  • on_llm_error() is triggered if any exception occurs during the execution of the program.
Python
from langchain_core.callbacks import BaseCallbackHandler
from typing import Any, Dict, List

class MyCallbackHandler(BaseCallbackHandler):
    def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str], **kwargs):
        print('===LLM start ===')
        print(f'Prompts are \n{prompts}')

    def on_llm_end(self, response, **kwargs):
        print("\n=== LLM END ===")

    def on_llm_error(self, error: Exception, **kwargs):
        print(f"\nLLM ERROR: {error}")

Step 3: Define the model's pipeline using Hugging face and transformers library

  • We load Qwen2.5 (1.5B parameter) model, this is a lightweight alternative and can run comfortably on a T4 GPU.
  • The tokenizer converts text into model-readable tokens and the model is loaded with device_map="auto" so it automatically uses available CPU/GPU resources.
  • The pipeline("text-generation") creates a simple interface for text generation, handling tokenization, inference and decoding.
  • HuggingFacePipeline() adapts the Hugging Face pipeline to LangChain’s LLM interface
Python
from langchain_community.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = "Qwen/Qwen2.5-1.5B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto"
)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=200,
)

llm = HuggingFacePipeline(
    pipeline=pipe,
    callbacks=[MyCallbackHandler()],
)

Step 4: Invoke the LLM with any query

Python
llm.invoke("hello how are you?")

Output :

Screenshot-2026-01-09-145644

In the given output both texts LLM start and LLM end were a result of the callback function firing when llm started and ended.

Advantages

  1. Reduces Blackbox execution : Callbacks let us see what's happening while the model is running, this visiblity makes debugging less frustrating.
  2. No interference with model flow : Callbacks only observe. You can add or remove them without worrying about prompts breaking or system flow getting altered.
  3. Modular in nature : Same callback can be used for LLM, a chain, agent, tool. this makes the code reusable and modular.

Limitation

  1. Cannot control execution : Callbacks cannot stop a run, modify outputs or change prompts, this makes them good for logging and observing but not custom logic to handle situations.
  2. Can clutter logs : Adding too many callbacks or logging too much information can make outputs noisy and harder to interpret, especially in large systems.
  3. Complexity & Overhead : for small projects, callbacks are not necessary and may add extra overhead and complexity.
Comment

Explore