Google’s Agent Development Kit (ADK) is a useful framework for creating autonomous AI agents. Unlike simple chatbot frameworks, ADK allows developers to build agents that can interact with text, images and PDFs, while maintaining session memory and handling multi-modal inputs.
Implementation
We’ll build a StudyBuddy, an AI tutor that can answer questions, analyze PDFs, describe images and provide explanations with examples. The agent will be interactive and session-based, allowing users to ask multiple questions in a single session. Let's build our agent:
Step 1: Install Dependencies
We need to install the necessary packages for our model such as google-adk, google-genai, PyPDF2, pillow.
!pip install --upgrade google-adk google-genai google-colab PyPDF2 pillow
Step 2: Import Libraries
We need to import the necessary libraries for our agent such as LlmAgent, Runner, InMemorySessionService, types.
from google.colab import userdata, files
import os
import asyncio
from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
import PyPDF2
import base64
Step 3: Setup API Key
We need to setup the our API key for agent, we will be using Gemini API key.
try:
os.environ['GEMINI_API_KEY'] = userdata.get('GOOGLE_API_KEY')
LLM_MODEL = "gemini-2.5-flash"
print( "API Key set successfully; using model =", LLM_MODEL)
except Exception:
print("ERROR: Please set 'GOOGLE_API_KEY' in Colab Secrets before running.")
raise
Step 4: Create the StudyBuddy Agent
Here:
- name: Agent’s name.
- model: LLM model used.
- instruction: How the agent should behave.
- description: Short overview of the agent’s capabilities.
studybuddy_agent = LlmAgent(
name="StudyBuddy",
model=LLM_MODEL,
instruction=(
"You are StudyBuddy, a friendly AI tutor. "
"You can answer questions, explain concepts, and analyze text, images, and PDFs. "
"Always give helpful examples and include at least one emoji. "
"When analyzing an image, provide a detailed description and context."
),
description="An AI tutor that helps students learn with text, images, and PDFs."
)
Step 5: Setup Session
We will:
- Create a persistent session so the agent can remember previous interactions.
- Useful for a conversational experience with continuity.
APP_NAME = "studybuddy_app"
USER_ID = "colab_user"
SESSION_ID = "studybuddy_session"
session_service = InMemorySessionService()
await session_service.create_session(
app_name=APP_NAME,
user_id=USER_ID,
session_id=SESSION_ID
)
Step 6: Create Runner
Runner acts as a bridge between the user and the agent. Handles asynchronous queries and ensures responses are properly formatted.
runner = Runner(
agent=studybuddy_agent,
app_name=APP_NAME,
session_service=session_service
)
Step 7: Define Query Handling Function
- Accepts text, PDF or image input.
- Converts input into ADK Content objects.
- Sends it to the agent and collects the final response.
async def run_query(query_text=None, pdf_path=None, image_data=None):
parts = []
if query_text:
parts.append(types.Part(text=query_text))
if image_data:
parts.append(types.Part(
inline_data=types.Blob(
mime_type="image/jpeg",
data=image_data
)
))
if pdf_path:
pdf_text = ""
with open(pdf_path, "rb") as f:
reader = PyPDF2.PdfReader(f)
for page in reader.pages:
pdf_text += page.extract_text() + "\n"
parts.append(types.Part(text=pdf_text))
content = types.Content(role="user", parts=parts)
final_response_text = "Agent did not produce a final response."
async for event in runner.run_async(
user_id=USER_ID,
session_id=SESSION_ID,
new_message=content
):
if event.is_final_response() and event.content and event.content.parts:
final_response_text = "".join(
p.text for p in event.content.parts if p.text)
break
return final_response_text
Step 8: Create Interactive Loop
- Provides an interactive menu for text, image and PDF queries.
- Ensures multimodal input is handled safely.
- Users can exit anytime.
async def interactive_studybuddy():
print("Welcome to StudyBuddy! You can ask questions, upload images or PDFs. Type 'exit' to quit.\n")
while True:
print("Options:")
print("1. Text question")
print("2. Upload an image")
print("3. Upload a PDF")
user_choice = input("Select option (1/2/3) or type 'exit': ").strip()
if user_choice.lower() in ["exit", "quit"]:
print("Goodbye! Happy studying!")
break
if user_choice == "1":
query_text = input("Your Question: ").strip()
response = await run_query(query_text=query_text)
elif user_choice == "2":
uploaded = files.upload()
if not uploaded:
print("No image uploaded. Please try again.\n")
continue
image_filename = next(iter(uploaded))
image_data = uploaded[image_filename]
follow_up_question = input(
f" Your question about '{image_filename}': ").strip()
query_with_filename = f"Regarding the image '{image_filename}', {follow_up_question}"
response = await run_query(query_text=query_with_filename, image_data=image_data)
elif user_choice == "3":
uploaded = files.upload()
if not uploaded:
print("No PDF uploaded. Please try again.\n")
continue
pdf_path = next(iter(uploaded))
response = await run_query(pdf_path=pdf_path)
else:
print("Invalid option. Try again.\n")
continue
print(f" StudyBuddy: {response}\n")
Step 9: Run the Agent
Starts the session and begins the interactive AI tutor loop.
await interactive_studybuddy()
a. Text Question:
b. Image:
Used sample can be downloaded from here.

c. PDF:
Used sample can be downloaded from here.

The complete code can be downloaded from here.
Advantages
- Multimodal Support: Handles text, PDFs and images seamlessly.
- Session Memory: Maintains context across multiple queries.
- Asynchronous Execution: Non-blocking, efficient handling of queries.
- Extensible: Easy to add new tools or capabilities to the agent.
- Developer-friendly: Structured like a real software project rather than a simple prompt.