This document describes how to configure synthesized speech responses and voice activity detection in Gemini Live API. You can configure responses in a variety of HD voices and languages, and also configure voice activity detection settings to allow users to interrupt the model.
Set the language and voice
Native audio models like gemini-live-2.5-flash-native-audio can switch between languages naturally during conversation. You can also restrict the languages it speaks in by specifying it in the system instructions.
For non-native-audio models like gemini-live-2.5-flash, you can configure the
language in speech_config.language_code.
Voice is configured in the voice_name field for all models.
The following code sample shows you how to configure language and voice.
from google.genai.types import LiveConnectConfig, SpeechConfig, VoiceConfig, PrebuiltVoiceConfig config = LiveConnectConfig( response_modalities=["AUDIO"], speech_config=SpeechConfig( voice_config=VoiceConfig( prebuilt_voice_config=PrebuiltVoiceConfig( voice_name=voice_name, ) ), language_code="en-US", ), )
Voices supported
Gemini Live API supports the following 30 voice options in the
voice_name field:
|
Zephyr -- Bright Kore -- Firm Orus -- Firm Autonoe -- Bright Umbriel -- Easy-going Erinome -- Clear Laomedeia -- Upbeat Schedar -- Even Achird -- Friendly Sadachbia -- Lively |
Puck -- Upbeat Fenrir -- Excitable Aoede -- Breezy Enceladus -- Breathy Algieba -- Smooth Algenib -- Gravelly Achernar -- Soft Gacrux -- Mature Zubenelgenubi -- Casual Sadaltager -- Knowledgeable |
Charon -- Informative Leda -- Youthful Callirrhoe -- Easy-going Iapetus -- Clear Despina -- Smooth Rasalgethi -- Informative Alnilam -- Firm Pulcherrima -- Forward Vindemiatrix -- Gentle Sulafat -- Warm |
Languages supported
Gemini Live API supports the following 24 languages:
| Language | BCP-47 Code | Language | BCP-47 Code |
|---|---|---|---|
| Arabic (Egyptian) | ar-EG | German (Germany) | de-DE |
| English (US) | en-US | Spanish (US) | es-US |
| French (France) | fr-FR | Hindi (India) | hi-IN |
| Indonesian (Indonesia) | id-ID | Italian (Italy) | it-IT |
| Japanese (Japan) | ja-JP | Korean (Korea) | ko-KR |
| Portuguese (Brazil) | pt-BR | Russian (Russia) | ru-RU |
| Dutch (Netherlands) | nl-NL | Polish (Poland) | pl-PL |
| Thai (Thailand) | th-TH | Turkish (Turkey) | tr-TR |
| Vietnamese (Vietnam) | vi-VN | Romanian (Romania) | ro-RO |
| Ukrainian (Ukraine) | uk-UA | Bengali (Bangladesh) | bn-BD |
| English (India) | en-IN & hi-IN bundle | Marathi (India) | mr-IN |
| Tamil (India) | ta-IN | Telugu (India) | te-IN |
Configure voice activity detection
Voice activity detection (VAD) allows the model to recognize when a person is speaking. This is essential for creating natural conversations, because it allows a user to interrupt the model at any time.
When VAD detects an interruption, the ongoing generation is canceled and
discarded. Only the information already sent to the client is retained in the
session history. The server then sends a BidiGenerateContentServerContent
message to report the interruption. The server then discards any pending
function calls and sends a BidiGenerateContentServerContent message with the
IDs of the canceled calls.
Python
config = { "response_modalities": ["audio"], "realtime_input_config": { "automatic_activity_detection": { "disabled": False, # default "start_of_speech_sensitivity": "low", "end_of_speech_sensitivity": "low", "prefix_padding_ms": 20, "silence_duration_ms": 100, } } }
What's next
- Start and manage live sessions
- Send audio and video streams
- Using speech-to-speech translation
- Best practices with Gemini Live API