Live API overview

The Live API enables low-latency, real-time voice and video interactions with Gemini. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses. This creates a natural conversational experience for your users.

Key features

The Live API offers a comprehensive set of features for building robust voice agents:

Technical specifications

The following table outlines the technical specifications for the Live API:

Category Details
Input modalities Audio (PCM 16kHz), video (1FPS), text
Output modalities Audio (PCM 24kHz), text
Protocol Stateful WebSocket connection (WSS)
Latency Real-time streaming for immediate feedback

Supported models

The following models support the Live API. Select the appropriate model based on your interaction requirements.

Model ID Availability Use case Key features
gemini-live-2.5-flash-preview-native-audio-09-2025 Public preview Cost-efficiency in real-time voice agents. Native audio
Audio transcriptions
Voice activity detection
Affective dialog
Proactive audio
Tool use
gemini-2.5-flash-s2st-exp-11-2025 Public experimental Speech-to-Speech Translation (experimental). Optimized for translation tasks. Native audio
Audio transcriptions
Tool use
Speech-to-speech translation

Architecture and integration

There are two primary ways to integrate the Live API into your application: server-to-server and client-to-server. Choose the one that fits your security and platform requirements.

Server-to-server

Server-to-server architecture is recommended for production environments such as mobile apps, secure enterprise tools, and telephony integration. Your client application streams audio to your secure backend server. Your server then manages the WebSocket connection to Google.

This method keeps your API keys secure and lets you modify audio or add logic before sending it to Gemini. However, it adds a small amount of network latency.

Client-to-server

Client-to-server architecture is suitable for web apps, quick demos, and internal tools. The web browser connects directly to the Live API using WebSockets.

This method provides the lowest possible latency and a simpler architecture for demos. Be aware that this approach exposes API keys to the frontend user, which creates a security risk. For production, you must use careful proxying or ephemeral token management.

Get started

Select the guide that matches your development environment:

Recommended for ease of use

Connect to the Live API using the Gen AI SDK, and send an audio file to Gemini and receive audio in response.

Raw protocol control

Connect to the Live API using WebSockets, and send an audio file to Gemini and receive audio in response.

Agent development kit

Create an agent and use the Agent Development Kit (ADK) Streaming to enable voice and video communication.

React/js integration

Set up and run a web application that enables you to use your voice and camera to talk to Gemini through the Live API.

Partner integrations

If you prefer a simpler development process, you can use Daily, LiveKit or Voximplant. These are third-party partner platforms that have already integrated the Gemini Live API over the WebRTC protocol to streamline the development of real-time audio and video applications.