Live API overview

The Live API enables low-latency, real-time voice and video interactions with Gemini. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses. This creates a natural conversational experience for your users.

Key features

The Live API offers a comprehensive set of features for building robust voice agents:

Native audio: Provides natural, realistic-sounding speech and improved multilingual performance.
Multilingual support: Converse in 24 supported languages.
Voice activity detection (VAD): Automatically handles interruptions and turn-taking.
Affective dialog: Adapts response style and tone to match the user's input expression.
Proactive audio: Lets you control when the model responds and in what contexts.
Thinking: Uses hidden reasoning tokens to "think" before speaking for complex queries.
Tool use: Integrates tools like function calling and Google Search for dynamic interactions.
Audio transcriptions: Provides text transcripts of both user input and model output.
Speech-to-speech translation: Optimized for low-latency translation between languages.

Technical specifications

The following table outlines the technical specifications for the Live API:

Category	Details
Input modalities	Audio (PCM 16kHz), video (1FPS), text
Output modalities	Audio (PCM 24kHz), text
Protocol	Stateful WebSocket connection (WSS)
Latency	Real-time streaming for immediate feedback

Supported models

The following models support the Live API. Select the appropriate model based on your interaction requirements.

Model ID	Availability	Use case	Key features
`gemini-live-2.5-flash-preview-native-audio-09-2025`	Public preview	Cost-efficiency in real-time voice agents.	Native audio Audio transcriptions Voice activity detection Affective dialog Proactive audio Tool use
`gemini-2.5-flash-s2st-exp-11-2025`	Public experimental	Speech-to-Speech Translation (experimental). Optimized for translation tasks.	Native audio Audio transcriptions Tool use Speech-to-speech translation

Architecture and integration

There are two primary ways to integrate the Live API into your application: server-to-server and client-to-server. Choose the one that fits your security and platform requirements.

Server-to-server

Server-to-server architecture is recommended for production environments such as mobile apps, secure enterprise tools, and telephony integration. Your client application streams audio to your secure backend server. Your server then manages the WebSocket connection to Google.

This method keeps your API keys secure and lets you modify audio or add logic before sending it to Gemini. However, it adds a small amount of network latency.

Client-to-server

Client-to-server architecture is suitable for web apps, quick demos, and internal tools. The web browser connects directly to the Live API using WebSockets.

This method provides the lowest possible latency and a simpler architecture for demos. Be aware that this approach exposes API keys to the frontend user, which creates a security risk. For production, you must use careful proxying or ephemeral token management.

Get started

Select the guide that matches your development environment:

Recommended for ease of use

Gen AI SDK tutorial

Connect to the Live API using the Gen AI SDK, and send an audio file to Gemini and receive audio in response.

Raw protocol control

WebSocket tutorial

Connect to the Live API using WebSockets, and send an audio file to Gemini and receive audio in response.

Agent development kit

ADK tutorial

Create an agent and use the Agent Development Kit (ADK) Streaming to enable voice and video communication.

React/js integration

Run a demo web app

Set up and run a web application that enables you to use your voice and camera to talk to Gemini through the Live API.

Partner integrations

If you prefer a simpler development process, you can use Daily, LiveKit or Voximplant. These are third-party partner platforms that have already integrated the Gemini Live API over the WebRTC protocol to streamline the development of real-time audio and video applications.

Live API overview Stay organized with collections Save and categorize content based on your preferences.