Structured output for open models

Structured outputs enable a model to generate output that always adheres to a specific schema. For example, a model may be provided with a response schema to ensure that the response produces valid JSON. All open models available on the Vertex AI Model as a Service (MaaS) support structured outputs.

For more conceptual information about the structured output capability, see Introduction to structured output.

Use structured outputs

The following use case sets a response schema that ensures that the model output is a JSON object with the following properties: name, date, and participants. The Python code uses the OpenAI SDK and Pydantic objects to generate the JSON schema.

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

completion = client.beta.chat.completions.parse(
    model="MODEL_NAME",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
    ],
    response_format=CalendarEvent,
)

print(completion.choices[0].message.parsed)

The model output will adhere to the following JSON schema:

{ "name": STRING, "date": STRING, "participants": [STRING] }

When provided the prompt, "Alice and Bob are going to a science fair on Friday", the model could produce the following response:

{
  "name": "science fair",
  "date": "Friday",
  "participants": [
    "Alice",
    "Bob"
  ]
}

Detailed example

The following code is an example of a recursive schema. The UI class contains a list of children, which can also be of the UI class.

from pydantic import BaseModel
from openai import OpenAI
from enum import Enum
from typing import List

client = OpenAI()

class UIType(str, Enum):
  div = "div"
  button = "button"
  header = "header"
  section = "section"
  field = "field"
  form = "form"

class Attribute(BaseModel):
  name: str
  value: str

class UI(BaseModel):
  type: UIType
  label: str
  children: List["UI"]
  attributes: List[Attribute]

UI.model_rebuild() # This is required to enable recursive types

class Response(BaseModel):
  ui: UI

completion = client.beta.chat.completions.parse(
  model="MODEL_NAME",
  messages=[
    {"role": "system", "content": "You are a UI generator AI. Convert the user input into a UI."},
    {"role": "user", "content": "Make a User Profile Form"}
  ],
  response_format=Response,
)

print(completion.choices[0].message.parsed)

The model output will adhere to the schema of the Pydantic object specified in the previous snippet. In this example, the model could generate the following UI form:

Form
  Input
    Name
    Email
    Age

A response could look like the following:

ui = UI(
    type=UIType.div,
    label='Form',
    children=[
        UI(
            type=UIType.div,
            label='Input',
            children=[],
            attributes=[
                Attribute(name='label', value='Name')
            ]
        ),
        UI(
            type=UIType.div,
            label='Input',
            children=[],
            attributes=[
                Attribute(name='label', value='Email')
            ]
        ),
        UI(
            type=UIType.div,
            label='Input',
            children=[],
            attributes=[
                Attribute(name='label', value='Age')
            ]
        )
    ],
    attributes=[
        Attribute(name='name', value='John Doe'),
        Attribute(name='email', value='[email protected]'),
        Attribute(name='age', value='30')
    ]
)

Get JSON object responses

You can constrain the model to output only syntactically valid JSON objects by setting the response_format field to { "type": "json_object" }. This is often called JSON mode. JSON mode is is useful when generating JSON for use with function calling or other downstream tasks that require JSON input.

When JSON mode is enabled, the model is constrained to only generate strings that parse into valid JSON objects. While this mode ensures the output is syntactically correct JSON, it does not enforce any specific schema. To ensure the model outputs JSON that follows a specific schema, you must include instructions in the prompt, as shown in the following example.

The following samples show how to enable JSON mode and instruct the model to return a JSON object with a specific structure:

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

Before running this sample, make sure to set the OPENAI_BASE_URL environment variable. For more information, see Authentication and credentials.

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="MODEL",
  response_format={ "type": "json_object" },
  messages=[
    {"role": "user", "content": "List 5 rivers in South America. Your response must be a JSON object with a single key \"rivers\", which has a list of strings as its value."},
  ]
)
print(response.choices[0].message.content)

Replace MODEL with the model name you want to use, for example meta/llama3-405b-instruct-maas.

REST

After you set up your environment, you can use REST to test a text prompt. The following sample sends a request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your Google Cloud project ID.
  • LOCATION: A region that supports open models.
  • MODEL: The model name you want to use, for example meta/llama3-405b-instruct-maas.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions

Request JSON body:

{
  "model": "MODEL",
  "response_format": {
    "type": "json_object"
  },
  "messages": [
    {
      "role": "user",
      "content": "List 5 rivers in South America. Your response must be a JSON object with a single key \"rivers\", which has a list of strings as its value."
    }
  ]
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

What's next