Building an AI Agent Using Google’s Agent Development Kit (ADK)

Last Updated : 28 Oct, 2025

Google’s Agent Development Kit (ADK) is a useful framework for creating autonomous AI agents. Unlike simple chatbot frameworks, ADK allows developers to build agents that can interact with text, images and PDFs, while maintaining session memory and handling multi-modal inputs.

Implementation

We’ll build a StudyBuddy, an AI tutor that can answer questions, analyze PDFs, describe images and provide explanations with examples. The agent will be interactive and session-based, allowing users to ask multiple questions in a single session. Let's build our agent:

Step 1: Install Dependencies

We need to install the necessary packages for our model such as google-adk, google-genai, PyPDF2, pillow.

Python
!pip install --upgrade google-adk google-genai google-colab PyPDF2 pillow

Step 2: Import Libraries

We need to import the necessary libraries for our agent such as LlmAgent, Runner, InMemorySessionService, types.

Python
from google.colab import userdata, files
import os
import asyncio
from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
import PyPDF2
import base64

Step 3: Setup API Key

We need to setup the our API key for agent, we will be using Gemini API key.

Python
try:
    os.environ['GEMINI_API_KEY'] = userdata.get('GOOGLE_API_KEY')
    LLM_MODEL = "gemini-2.5-flash"
    print( "API Key set successfully; using model =", LLM_MODEL)
except Exception:
    print("ERROR: Please set 'GOOGLE_API_KEY' in Colab Secrets before running.")
    raise

Step 4: Create the StudyBuddy Agent

Here:

  • name: Agent’s name.
  • model: LLM model used.
  • instruction: How the agent should behave.
  • description: Short overview of the agent’s capabilities.
Python
studybuddy_agent = LlmAgent(
    name="StudyBuddy",
    model=LLM_MODEL,
    instruction=(
        "You are StudyBuddy, a friendly AI tutor. "
        "You can answer questions, explain concepts, and analyze text, images, and PDFs. "
        "Always give helpful examples and include at least one emoji. "
        "When analyzing an image, provide a detailed description and context."
    ),
    description="An AI tutor that helps students learn with text, images, and PDFs."
)

Step 5: Setup Session

We will:

  • Create a persistent session so the agent can remember previous interactions.
  • Useful for a conversational experience with continuity.
Python
APP_NAME = "studybuddy_app"
USER_ID = "colab_user"
SESSION_ID = "studybuddy_session"

session_service = InMemorySessionService()
await session_service.create_session(
    app_name=APP_NAME,
    user_id=USER_ID,
    session_id=SESSION_ID
)

Step 6: Create Runner

Runner acts as a bridge between the user and the agent. Handles asynchronous queries and ensures responses are properly formatted.

Python
runner = Runner(
    agent=studybuddy_agent,
    app_name=APP_NAME,
    session_service=session_service
)

Step 7: Define Query Handling Function

  • Accepts text, PDF or image input.
  • Converts input into ADK Content objects.
  • Sends it to the agent and collects the final response.
Python
async def run_query(query_text=None, pdf_path=None, image_data=None):
    parts = []

    if query_text:
        parts.append(types.Part(text=query_text))

    if image_data:
        parts.append(types.Part(
            inline_data=types.Blob(
                mime_type="image/jpeg",
                data=image_data
            )
        ))

    if pdf_path:
        pdf_text = ""
        with open(pdf_path, "rb") as f:
            reader = PyPDF2.PdfReader(f)
            for page in reader.pages:
                pdf_text += page.extract_text() + "\n"
        parts.append(types.Part(text=pdf_text))

    content = types.Content(role="user", parts=parts)
    final_response_text = "Agent did not produce a final response."

    async for event in runner.run_async(
        user_id=USER_ID,
        session_id=SESSION_ID,
        new_message=content
    ):
        if event.is_final_response() and event.content and event.content.parts:
            final_response_text = "".join(
                p.text for p in event.content.parts if p.text)
            break

    return final_response_text

Step 8: Create Interactive Loop

  • Provides an interactive menu for text, image and PDF queries.
  • Ensures multimodal input is handled safely.
  • Users can exit anytime.
Python
async def interactive_studybuddy():
    print("Welcome to StudyBuddy! You can ask questions, upload images or PDFs. Type 'exit' to quit.\n")

    while True:
        print("Options:")
        print("1. Text question")
        print("2. Upload an image")
        print("3. Upload a PDF")
        user_choice = input("Select option (1/2/3) or type 'exit': ").strip()

        if user_choice.lower() in ["exit", "quit"]:
            print("Goodbye! Happy studying!")
            break

        if user_choice == "1":
            query_text = input("Your Question: ").strip()
            response = await run_query(query_text=query_text)

        elif user_choice == "2":
            uploaded = files.upload()
            if not uploaded:
                print("No image uploaded. Please try again.\n")
                continue
            image_filename = next(iter(uploaded))
            image_data = uploaded[image_filename]
            follow_up_question = input(
                f" Your question about '{image_filename}': ").strip()
            query_with_filename = f"Regarding the image '{image_filename}', {follow_up_question}"
            response = await run_query(query_text=query_with_filename, image_data=image_data)

        elif user_choice == "3":
            uploaded = files.upload()
            if not uploaded:
                print("No PDF uploaded. Please try again.\n")
                continue
            pdf_path = next(iter(uploaded))
            response = await run_query(pdf_path=pdf_path)

        else:
            print("Invalid option. Try again.\n")
            continue

        print(f" StudyBuddy: {response}\n")

Step 9: Run the Agent

Starts the session and begins the interactive AI tutor loop.

Python
await interactive_studybuddy()

a. Text Question:

b. Image:

Used sample can be downloaded from here.

Screenshot-2025-10-14-155533
Image

c. PDF:

Used sample can be downloaded from here.

Screenshot-2025-10-14-155648
PDF

The complete code can be downloaded from here.

Advantages

  • Multimodal Support: Handles text, PDFs and images seamlessly.
  • Session Memory: Maintains context across multiple queries.
  • Asynchronous Execution: Non-blocking, efficient handling of queries.
  • Extensible: Easy to add new tools or capabilities to the agent.
  • Developer-friendly: Structured like a real software project rather than a simple prompt.
Comment

Explore