|
| 1 | +# Gemini Live API Examples |
| 2 | + |
| 3 | +The Live API enables low-latency, real-time voice and video interactions with |
| 4 | +Gemini. It processes continuous streams of audio, video, or text to deliver |
| 5 | +immediate, human-like spoken responses, creating a natural conversational |
| 6 | +experience for your users. |
| 7 | + |
| 8 | + |
| 9 | + |
| 10 | +[Try the Live API in Google AI Studio](https://aistudio.google.com/live) |
| 11 | + |
| 12 | +## Example use cases |
| 13 | + |
| 14 | +Live API can be used to build real-time voice and video agents for a |
| 15 | +variety of industries, including: |
| 16 | + |
| 17 | +* **E-commerce and retail:** Shopping assistants that offer personalized |
| 18 | + recommendations and support agents that resolve customer issues. |
| 19 | +* **Gaming:** Interactive non-player characters (NPCs), in-game help |
| 20 | + assistants, and real-time translation of in-game content. |
| 21 | +* **Next-gen interfaces:** Voice- and video-enabled experiences in robotics, |
| 22 | + smart glasses, and vehicles. |
| 23 | +* **Healthcare:** Health companions for patient support and education. |
| 24 | +* **Financial services:** AI advisors for wealth management and investment |
| 25 | + guidance. |
| 26 | +* **Education:** AI mentors and learner companions that provide personalized |
| 27 | + instruction and feedback. |
| 28 | + |
| 29 | +## Key features |
| 30 | + |
| 31 | +Live API offers a comprehensive set of features for building |
| 32 | +robust voice and video agents: |
| 33 | + |
| 34 | +* [**Multilingual support**](https://ai.google.dev/gemini-api/docs/live-guide#supported-languages): |
| 35 | + Converse in 70 supported languages. |
| 36 | +* [**Barge-in**](https://ai.google.dev/gemini-api/docs/live-guide#interruptions): |
| 37 | + Users can interrupt the model at any time for responsive interactions. |
| 38 | +* [**Tool use**](https://ai.google.dev/gemini-api/docs/live-tools): |
| 39 | + Integrates tools like function calling and Google Search for dynamic |
| 40 | + interactions. |
| 41 | +* [**Audio transcriptions**](https://ai.google.dev/gemini-api/docs/live-guide#audio-transcription): |
| 42 | + Provides text transcripts of both user input and model output. |
| 43 | +* [**Proactive audio**](https://ai.google.dev/gemini-api/docs/live-guide#proactive-audio): |
| 44 | + Lets you control when the model responds and in what contexts. |
| 45 | +* [**Affective dialog**](https://ai.google.dev/gemini-api/docs/live-guide#affective-dialog): |
| 46 | + Adapts response style and tone to match the user's input expression. |
| 47 | + |
| 48 | +## Technical specifications |
| 49 | + |
| 50 | +The following table outlines the technical specifications for the |
| 51 | +Live API: |
| 52 | + |
| 53 | +| Category | Details | |
| 54 | +| :---------------- | :------------------------------------------------------------------------------------------ | |
| 55 | +| Input modalities | Audio (raw 16-bit PCM audio, 16kHz, little-endian), images/video (JPEG <= 1FPS), text | |
| 56 | +| Output modalities | Audio (raw 16-bit PCM audio, 24kHz, little-endian), text | |
| 57 | +| Protocol | Stateful WebSocket connection (WSS) | |
| 58 | + |
| 59 | +## Examples |
| 60 | + |
| 61 | +* **[Gen AI SDK Python example](./gemini-live-genai-python-sdk/README.md)**: Recommended for ease of use. Connect to the Gemini Live API using the Gen AI SDK to build a real-time multimodal application with a Python backend. |
| 62 | +* **[Epheremal tokens and raw WebSocket example](./gemini-live-ephemeral-tokens-websocket/README.md)**: RAW protocol control. Connect to the Gemini Live API using WebSockets to build a real-time multimodal application with a JavaScript frontend and a Python backend. |
| 63 | + |
| 64 | +## Partner integrations |
| 65 | + |
| 66 | +To streamline the development of real-time audio and video apps, you can use |
| 67 | +a third-party integration that supports the Gemini Live |
| 68 | +API over WebRTC or WebSockets. |
| 69 | + |
| 70 | +* [LiveKit](https://docs.livekit.io/agents/models/realtime/plugins/gemini/): Use the Gemini Live API with LiveKit Agents. |
| 71 | +* [Pipecat by Daily](https://docs.pipecat.ai/guides/features/gemini-live): Create a real-time AI chatbot using Gemini Live and Pipecat. |
| 72 | +* [Fishjam by Software Mansion](https://docs.fishjam.io/tutorials/gemini-live-integration): Create live video and audio streaming applications with Fishjam. |
| 73 | +* [Vision Agents by Stream](https://visionagents.ai/integrations/gemini): Build real-time voice and video AI applications with Vision Agents. |
| 74 | +* [Voximplant](https://voximplant.com/products/gemini-client): Connect inbound and outbound calls to Live API with Voximplant. |
| 75 | +* [Agent Development Kit (ADK)](https://google.github.io/adk-docs/streaming/): Create an agent and use the Agent Development Kit (ADK) Streaming to enable voice and video communication. |
| 76 | +* [Firebase AI SDK](https://firebase.google.com/docs/ai-logic/live-api?api=dev): Get started with the Gemini Live API using Firebase AI Logic. |
0 commit comments