Skip to content

feat: Add support for image function tools #654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vaydingul
Copy link

This PR adds support for image function tools to the OpenAI Agents Python SDK.

It is inspired by the #341 .

Current function_tool implementation only allows output to be strictly string, which creates a problem when we want to pass image input in the request data. This PR tackles this problem by both providing a standart function_call_output and additional image-related arguments back-to-back.

What's included

  • Added ImageFunctionTool class and image_function_tool decorator
  • Implemented necessary support in the run implementation, models, and item handling
  • Added example showing usage of image function tools (examples/tools/image_function_tool.py)

Usage

Use the @image_function_tool decorator to create tools that work with images:

@image_function_tool
def image_to_base64(path: str) -> str:
    """
    This function takes a path to an image and returns a base64 encoded string of the image.
    It is used to convert the image to a base64 encoded string so that it can be sent to the LLM.
    """
    with open(path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode("utf-8")
    return f"data:image/jpeg;base64,{encoded_string}"

The tool can then be used to allow agents to process and analyze images.

Supporting example script is located at examples/tools/image_function_tool.py

This commit introduces the ImageFunctionTool and ImageFunctionToolResult classes, enabling the creation and execution of image-generating tools. The necessary modifications include updates to the tool execution logic, new data classes for handling image function calls, and adjustments to the response processing to accommodate image outputs. Additionally, the input handling in the Runner class has been refined to support the new image function items.
Changes include:
- New classes: ImageFunctionTool, ImageFunctionToolResult, ToolRunImageFunction
- Updated tool execution methods to handle image functions
- Modifications to the ProcessedResponse class to include image function results
- Enhancements to ItemHelpers for image function output formatting
- Adjustments in the Runner class for input item processing

These changes enhance the SDK's capabilities for handling image generation tasks alongside existing function tools.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant