Building an AI Agent Using Google’s Agent Development Kit (ADK)

Google’s Agent Development Kit (ADK) is a useful framework for creating autonomous AI agents. Unlike simple chatbot frameworks, ADK allows developers to build agents that can interact with text, images and PDFs, while maintaining session memory and handling multi-modal inputs.

Implementation

We’ll build a StudyBuddy, an AI tutor that can answer questions, analyze PDFs, describe images and provide explanations with examples. The agent will be interactive and session-based, allowing users to ask multiple questions in a single session. Let's build our agent:

Step 1: Install Dependencies

We need to install the necessary packages for our model such as google-adk, google-genai, PyPDF2, pillow.

Python

!pip install --upgrade google-adk google-genai google-colab PyPDF2 pillow

Step 2: Import Libraries

We need to import the necessary libraries for our agent such as LlmAgent, Runner, InMemorySessionService, types.

Python

from google.colab import userdata, files
import os
import asyncio
from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
import PyPDF2
import base64

Step 3: Setup API Key

We need to setup the our API key for agent, we will be using Gemini API key.

Python

try:
    os.environ['GEMINI_API_KEY'] = userdata.get('GOOGLE_API_KEY')
    LLM_MODEL = "gemini-2.5-flash"
    print( "API Key set successfully; using model =", LLM_MODEL)
except Exception:
    print("ERROR: Please set 'GOOGLE_API_KEY' in Colab Secrets before running.")
    raise

Step 4: Create the StudyBuddy Agent

Here:

name: Agent’s name.
model: LLM model used.
instruction: How the agent should behave.
description: Short overview of the agent’s capabilities.

Python

studybuddy_agent = LlmAgent(
    name="StudyBuddy",
    model=LLM_MODEL,
    instruction=(
        "You are StudyBuddy, a friendly AI tutor. "
        "You can answer questions, explain concepts, and analyze text, images, and PDFs. "
        "Always give helpful examples and include at least one emoji. "
        "When analyzing an image, provide a detailed description and context."
    ),
    description="An AI tutor that helps students learn with text, images, and PDFs."
)

Step 5: Setup Session

We will:

Create a persistent session so the agent can remember previous interactions.
Useful for a conversational experience with continuity.

Python

APP_NAME = "studybuddy_app"
USER_ID = "colab_user"
SESSION_ID = "studybuddy_session"

session_service = InMemorySessionService()
await session_service.create_session(
    app_name=APP_NAME,
    user_id=USER_ID,
    session_id=SESSION_ID
)

Step 6: Create Runner

Runner acts as a bridge between the user and the agent. Handles asynchronous queries and ensures responses are properly formatted.

Python

runner = Runner(
    agent=studybuddy_agent,
    app_name=APP_NAME,
    session_service=session_service
)

Step 7: Define Query Handling Function

Accepts text, PDF or image input.
Converts input into ADK Content objects.
Sends it to the agent and collects the final response.

Python

async def run_query(query_text=None, pdf_path=None, image_data=None):
    parts = []

    if query_text:
        parts.append(types.Part(text=query_text))

    if image_data:
        parts.append(types.Part(
            inline_data=types.Blob(
                mime_type="image/jpeg",
                data=image_data
            )
        ))

    if pdf_path:
        pdf_text = ""
        with open(pdf_path, "rb") as f:
            reader = PyPDF2.PdfReader(f)
            for page in reader.pages:
                pdf_text += page.extract_text() + "\n"
        parts.append(types.Part(text=pdf_text))

    content = types.Content(role="user", parts=parts)
    final_response_text = "Agent did not produce a final response."

    async for event in runner.run_async(
        user_id=USER_ID,
        session_id=SESSION_ID,
        new_message=content
    ):
        if event.is_final_response() and event.content and event.content.parts:
            final_response_text = "".join(
                p.text for p in event.content.parts if p.text)
            break

    return final_response_text

Step 8: Create Interactive Loop

Provides an interactive menu for text, image and PDF queries.
Ensures multimodal input is handled safely.
Users can exit anytime.

Python

async def interactive_studybuddy():
    print("Welcome to StudyBuddy! You can ask questions, upload images or PDFs. Type 'exit' to quit.\n")

    while True:
        print("Options:")
        print("1. Text question")
        print("2. Upload an image")
        print("3. Upload a PDF")
        user_choice = input("Select option (1/2/3) or type 'exit': ").strip()

        if user_choice.lower() in ["exit", "quit"]:
            print("Goodbye! Happy studying!")
            break

        if user_choice == "1":
            query_text = input("Your Question: ").strip()
            response = await run_query(query_text=query_text)

        elif user_choice == "2":
            uploaded = files.upload()
            if not uploaded:
                print("No image uploaded. Please try again.\n")
                continue
            image_filename = next(iter(uploaded))
            image_data = uploaded[image_filename]
            follow_up_question = input(
                f" Your question about '{image_filename}': ").strip()
            query_with_filename = f"Regarding the image '{image_filename}', {follow_up_question}"
            response = await run_query(query_text=query_with_filename, image_data=image_data)

        elif user_choice == "3":
            uploaded = files.upload()
            if not uploaded:
                print("No PDF uploaded. Please try again.\n")
                continue
            pdf_path = next(iter(uploaded))
            response = await run_query(pdf_path=pdf_path)

        else:
            print("Invalid option. Try again.\n")
            continue

        print(f" StudyBuddy: {response}\n")

Step 9: Run the Agent

Starts the session and begins the interactive AI tutor loop.

Python

await interactive_studybuddy()

a. Text Question:

b. Image:

Used sample can be downloaded from here.

c. PDF:

Used sample can be downloaded from here.

The complete code can be downloaded from here.

Advantages

Multimodal Support: Handles text, PDFs and images seamlessly.
Session Memory: Maintains context across multiple queries.
Asynchronous Execution: Non-blocking, efficient handling of queries.
Extensible: Easy to add new tools or capabilities to the agent.
Developer-friendly: Structured like a real software project rather than a simple prompt.

Building an AI Agent Using Google’s Agent Development Kit (ADK)

Implementation

Step 1: Install Dependencies

Step 2: Import Libraries

Step 3: Setup API Key

Step 4: Create the StudyBuddy Agent

Step 5: Setup Session

Step 6: Create Runner

Step 7: Define Query Handling Function

Step 8: Create Interactive Loop

Step 9: Run the Agent

Advantages

Explore