OpenAI models on Vertex AI offer fully managed and serverless models as APIs. To use an OpenAI model on Vertex AI, send a request directly to the Vertex AI API endpoint. Because OpenAI models use a managed API, there's no need to provision or manage infrastructure.
You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.
Available OpenAI models
The following models are available from OpenAI to use in Vertex AI. To access an OpenAI model, go to its Model Garden model card.
gpt-oss 120B
OpenAI gpt-oss 120B is a 120B open-weight language model released under the Apache 2.0 license. It is well-suited for reasoning and function calling use cases. The model is optimized for deployment on consumer hardware.
The 120B model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running on a single 80GB GPU.
Go to the gpt-oss 120B model card
gpt-oss 20B
OpenAI gpt-oss 20B is a 20B open-weight language model released under the Apache 2.0 license. It is well-suited for reasoning and function calling use cases. The model is optimized for deployment on consumer hardware.
The 20B model delivers similar results to OpenAI o3-mini on common benchmarks and can run on edge devices with 16GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure.
Go to the gpt-oss 20B model card
Use OpenAI models
To learn how to make streaming and non-streaming calls to OpenAI models, see Call open model APIs.
What's next
- Learn how to Call open model APIs.