Skip to content

Commit 239adea

Browse files
committed
Initial examples.
1 parent 7c1c5ca commit 239adea

27 files changed

Lines changed: 4751 additions & 0 deletions

README.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# Gemini Live API Examples
2+
3+
The Live API enables low-latency, real-time voice and video interactions with
4+
Gemini. It processes continuous streams of audio, video, or text to deliver
5+
immediate, human-like spoken responses, creating a natural conversational
6+
experience for your users.
7+
8+
![Live API Overview](https://ai.google.dev/gemini-api/docs/images/live-api-overview.png)
9+
10+
[Try the Live API in Google AI Studio](https://aistudio.google.com/live)
11+
12+
## Example use cases
13+
14+
Live API can be used to build real-time voice and video agents for a
15+
variety of industries, including:
16+
17+
* **E-commerce and retail:** Shopping assistants that offer personalized
18+
recommendations and support agents that resolve customer issues.
19+
* **Gaming:** Interactive non-player characters (NPCs), in-game help
20+
assistants, and real-time translation of in-game content.
21+
* **Next-gen interfaces:** Voice- and video-enabled experiences in robotics,
22+
smart glasses, and vehicles.
23+
* **Healthcare:** Health companions for patient support and education.
24+
* **Financial services:** AI advisors for wealth management and investment
25+
guidance.
26+
* **Education:** AI mentors and learner companions that provide personalized
27+
instruction and feedback.
28+
29+
## Key features
30+
31+
Live API offers a comprehensive set of features for building
32+
robust voice and video agents:
33+
34+
* [**Multilingual support**](https://ai.google.dev/gemini-api/docs/live-guide#supported-languages):
35+
Converse in 70 supported languages.
36+
* [**Barge-in**](https://ai.google.dev/gemini-api/docs/live-guide#interruptions):
37+
Users can interrupt the model at any time for responsive interactions.
38+
* [**Tool use**](https://ai.google.dev/gemini-api/docs/live-tools):
39+
Integrates tools like function calling and Google Search for dynamic
40+
interactions.
41+
* [**Audio transcriptions**](https://ai.google.dev/gemini-api/docs/live-guide#audio-transcription):
42+
Provides text transcripts of both user input and model output.
43+
* [**Proactive audio**](https://ai.google.dev/gemini-api/docs/live-guide#proactive-audio):
44+
Lets you control when the model responds and in what contexts.
45+
* [**Affective dialog**](https://ai.google.dev/gemini-api/docs/live-guide#affective-dialog):
46+
Adapts response style and tone to match the user's input expression.
47+
48+
## Technical specifications
49+
50+
The following table outlines the technical specifications for the
51+
Live API:
52+
53+
| Category | Details |
54+
| :---------------- | :------------------------------------------------------------------------------------------ |
55+
| Input modalities | Audio (raw 16-bit PCM audio, 16kHz, little-endian), images/video (JPEG <= 1FPS), text |
56+
| Output modalities | Audio (raw 16-bit PCM audio, 24kHz, little-endian), text |
57+
| Protocol | Stateful WebSocket connection (WSS) |
58+
59+
## Examples
60+
61+
* **[Gen AI SDK Python example](./gemini-live-genai-python-sdk/README.md)**: Recommended for ease of use. Connect to the Gemini Live API using the Gen AI SDK to build a real-time multimodal application with a Python backend.
62+
* **[Epheremal tokens and raw WebSocket example](./gemini-live-ephemeral-tokens-websocket/README.md)**: RAW protocol control. Connect to the Gemini Live API using WebSockets to build a real-time multimodal application with a JavaScript frontend and a Python backend.
63+
64+
## Partner integrations
65+
66+
To streamline the development of real-time audio and video apps, you can use
67+
a third-party integration that supports the Gemini Live
68+
API over WebRTC or WebSockets.
69+
70+
* [LiveKit](https://docs.livekit.io/agents/models/realtime/plugins/gemini/): Use the Gemini Live API with LiveKit Agents.
71+
* [Pipecat by Daily](https://docs.pipecat.ai/guides/features/gemini-live): Create a real-time AI chatbot using Gemini Live and Pipecat.
72+
* [Fishjam by Software Mansion](https://docs.fishjam.io/tutorials/gemini-live-integration): Create live video and audio streaming applications with Fishjam.
73+
* [Vision Agents by Stream](https://visionagents.ai/integrations/gemini): Build real-time voice and video AI applications with Vision Agents.
74+
* [Voximplant](https://voximplant.com/products/gemini-client): Connect inbound and outbound calls to Live API with Voximplant.
75+
* [Agent Development Kit (ADK)](https://google.github.io/adk-docs/streaming/): Create an agent and use the Agent Development Kit (ADK) Streaming to enable voice and video communication.
76+
* [Firebase AI SDK](https://firebase.google.com/docs/ai-logic/live-api?api=dev): Get started with the Gemini Live API using Firebase AI Logic.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
GEMINI_API_KEY=
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
__pycache__
2+
.venv
3+
.env
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Gemini Live API - Vanilla JS
2+
3+
WebSocket client for Google's Gemini Live API with audio/video streaming support using **Ephemeral Tokens**. No frameworks, just vanilla JavaScript.
4+
5+
## Quick Start
6+
7+
```bash
8+
# 1. Install uv (if not already installed)
9+
# curl -LsSf https://astral.sh/uv/install.sh | sh
10+
11+
# 2. Create a virtual environment and sync dependencies
12+
uv venv
13+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
14+
uv pip install -r requirements.txt
15+
16+
# 3. Set your API Key in a .env file
17+
echo "GEMINI_API_KEY=your_actual_api_key_here" > .env
18+
19+
# 4. Start server
20+
uv run server.py
21+
22+
# 5. Open browser
23+
open http://localhost:8000
24+
```
25+
26+
## Features
27+
28+
- **Direct client-to-server connection**: Low latency WebSocket connection directly to the Gemini Live API.
29+
- **Ephemeral Tokens**: Improved security by using short-lived tokens generated by the backend.
30+
- **Real-time audio/video streaming**: High-performance streaming using standard Web APIs.
31+
- **Custom tools**: Example implementations for browser alerts and CSS injection.
32+
- **Device selection**: Full control over microphone and camera inputs.
33+
34+
## Project Structure
35+
36+
```
37+
/
38+
├── server.py # Token provisioning + HTTP server
39+
├── .env # API key configuration
40+
├── requirements.txt # Python dependencies
41+
└── frontend/
42+
├── index.html # UI
43+
├── geminilive.js # Gemini API client (direct connection)
44+
├── mediaUtils.js # Audio/video streaming logic
45+
├── tools.js # Custom tool definitions
46+
└── script.js # Application workflow
47+
```
48+
49+
## Core APIs
50+
51+
### GeminiLive Client
52+
53+
```javascript
54+
// Connect using an ephemeral token
55+
const client = new GeminiLiveAPI(token, model);
56+
client.addFunction(toolInstance); // Add custom tools
57+
await client.connect(); // Establish direct WebSocket
58+
client.sendTextMessage("Hello"); // Send text
59+
client.sendAudioMessage(base64); // Send audio
60+
client.sendImageMessage(base64); // Send image
61+
```
62+
63+
### Media Streaming
64+
65+
```javascript
66+
// Audio streaming
67+
const audioStreamer = new AudioStreamer(client);
68+
await audioStreamer.start(deviceId);
69+
70+
// Video streaming
71+
const videoStreamer = new VideoStreamer(client);
72+
await videoStreamer.start({ fps: 1, width: 640, height: 480 });
73+
74+
// Audio playback
75+
const player = new AudioPlayer();
76+
await player.play(base64PCM);
77+
```
78+
79+
### Custom Tools
80+
81+
```javascript
82+
class MyTool extends FunctionCallDefinition {
83+
constructor() {
84+
super("tool_name", "description", parameters, required);
85+
}
86+
87+
functionToCall(params) {
88+
// Tool implementation logic
89+
}
90+
}
91+
```
92+
93+
## Configuration Options
94+
95+
- **Model**: `gemini-2.5-flash-native-audio-preview-12-2025` (default)
96+
- **Voice**: Puck, Charon, Kore, Fenrir, Aoede
97+
- **Response**: Audio, text, or both
98+
- **Tools**: Custom functions or Google Search grounding
99+
100+
## Security & Architecture
101+
102+
This demo uses the **Ephemeral Token** approach:
103+
104+
1. **Backend**: Uses `GEMINI_API_KEY` to request a short-lived (ephemeral) token via the `google-genai` SDK.
105+
2. **Frontend**: Fetches this token from the backend `/api/token` endpoint.
106+
3. **Direct Connection**: The browser establishes a WebSocket connection directly to `generativelanguage.googleapis.com` using the token, bypassing the proxy for data streaming.
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
/**
2+
* Audio Worklet Processor for capturing and processing audio
3+
*/
4+
5+
class AudioCaptureProcessor extends AudioWorkletProcessor {
6+
constructor() {
7+
super();
8+
this.bufferSize = 4096;
9+
this.buffer = new Float32Array(this.bufferSize);
10+
this.bufferIndex = 0;
11+
}
12+
13+
process(inputs, outputs, parameters) {
14+
const input = inputs[0];
15+
16+
if (input && input.length > 0) {
17+
const inputChannel = input[0];
18+
19+
// Buffer the incoming audio
20+
for (let i = 0; i < inputChannel.length; i++) {
21+
this.buffer[this.bufferIndex++] = inputChannel[i];
22+
23+
// When buffer is full, send it to main thread
24+
if (this.bufferIndex >= this.bufferSize) {
25+
// Send the buffered audio to the main thread
26+
this.port.postMessage({
27+
type: "audio",
28+
data: this.buffer.slice(),
29+
});
30+
31+
// Reset buffer
32+
this.bufferIndex = 0;
33+
}
34+
}
35+
}
36+
37+
// Return true to keep the processor alive
38+
return true;
39+
}
40+
}
41+
42+
// Register the processor
43+
registerProcessor("audio-capture-processor", AudioCaptureProcessor);
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
/**
2+
* Audio Playback Worklet Processor for playing PCM audio
3+
*/
4+
5+
class PCMProcessor extends AudioWorkletProcessor {
6+
constructor() {
7+
super();
8+
this.audioQueue = [];
9+
10+
this.port.onmessage = (event) => {
11+
if (event.data === "interrupt") {
12+
// Clear the queue on interrupt
13+
this.audioQueue = [];
14+
} else if (event.data instanceof Float32Array) {
15+
// Add audio data to the queue
16+
this.audioQueue.push(event.data);
17+
}
18+
};
19+
}
20+
21+
process(inputs, outputs, parameters) {
22+
const output = outputs[0];
23+
if (output.length === 0) return true;
24+
25+
const channel = output[0];
26+
let outputIndex = 0;
27+
28+
// Fill the output buffer from the queue
29+
while (outputIndex < channel.length && this.audioQueue.length > 0) {
30+
const currentBuffer = this.audioQueue[0];
31+
32+
if (!currentBuffer || currentBuffer.length === 0) {
33+
this.audioQueue.shift();
34+
continue;
35+
}
36+
37+
const remainingOutput = channel.length - outputIndex;
38+
const remainingBuffer = currentBuffer.length;
39+
const copyLength = Math.min(remainingOutput, remainingBuffer);
40+
41+
// Copy audio data to output
42+
for (let i = 0; i < copyLength; i++) {
43+
channel[outputIndex++] = currentBuffer[i];
44+
}
45+
46+
// Update or remove the current buffer
47+
if (copyLength < remainingBuffer) {
48+
this.audioQueue[0] = currentBuffer.slice(copyLength);
49+
} else {
50+
this.audioQueue.shift();
51+
}
52+
}
53+
54+
// Fill remaining output with silence
55+
while (outputIndex < channel.length) {
56+
channel[outputIndex++] = 0;
57+
}
58+
59+
return true;
60+
}
61+
}
62+
63+
registerProcessor("pcm-processor", PCMProcessor);

0 commit comments

Comments
 (0)