Translation via the /v4/listen API is the largest translation cost driver. Currently every user with a primary language set gets real-time Google Cloud Translation API calls on every conversation, even monolingual speakers. This issue proposes 5 changes to cut costs while keeping translation seamless for users who need it.
Current Behavior
- Default ON:
single_language_mode defaults to false. Any user who sets a primary language during onboarding gets auto-translation active on every listen session.
- Global toggle only: Translation is all-or-nothing via Settings → Language → Automatic Translation. No per-conversation control.
- Real-time per-segment:
TranslationCoordinator processes every segment through language detection → classification → GCP Translation API ($15/M chars). Monolingual gate skips most calls after 4 consecutive same-language detections, but the coordinator is instantiated for every session.
- No screen awareness: Translation happens even when the phone screen is off and user isn't viewing the transcript.
- No time limit: No restriction on translating old segments retroactively.
- Desktop waste: Desktop app sends
language param to backend, backend translates and persists, but desktop never displays translations — wasted API calls.
Expected Behavior
Smart, cost-efficient translation that's seamless when users need it but doesn't burn API calls when they don't.
Affected Areas
| File |
Description |
backend/routers/transcribe.py:318-328 |
Translation language determination — currently enabled whenever single_language_mode=false and language preference exists |
backend/utils/translation_coordinator.py |
Real-time coordinator — instantiated every session even for monolingual users |
backend/utils/translation.py |
GCP Translation API client — _client.translate_text() calls |
backend/utils/translation_cache.py |
Monolingual gate + caching (already good, but gate activates too late) |
backend/routers/users.py:525-534 |
Language preference endpoint — auto-sets single_language_mode based on language support, not user choice |
app/lib/pages/settings/language_settings_page.dart |
Global translation toggle UI |
app/lib/services/sockets/transcription_service.dart |
WebSocket language param |
app/lib/providers/capture_provider.dart |
TranslationEvent handler |
app/lib/widgets/transcript.dart |
Translation display in conversation view |
Solution
1. Default auto-translate OFF
- Change
single_language_mode default to true for new users
- Existing users with translation enabled keep their setting (no migration needed)
- Users who want translation explicitly enable it in Settings
2. Per-conversation toggle in live conversation UI
- Add a translate button/toggle in the live conversation capturing screen
- When user turns it on mid-conversation, backend starts translating from that point
- Once enabled for a conversation, keep auto-translate on for that conversation permanently (sticky per-conversation)
- Settings → Language still has the global option to turn it off entirely
3. Batch translation for on-demand activation
- When user enables translation mid-conversation, batch-translate existing segments (only segments created in the last 24h)
- Use
translate_units_batch() (already exists) for efficient batching instead of per-segment real-time calls
- Cap retroactive translation to 24h window to prevent unbounded cost on long histories
4. Screen-aware deferral
- Track screen on/off state from the mobile app (send as WebSocket metadata or periodic signal)
- When screen is off: defer translation, accumulate segments
- When screen turns on: batch-translate accumulated segments in one API call
- Net effect: same UX (translations appear when user looks), far fewer API calls (1 batch vs N real-time)
5. Desktop: skip translation entirely
- Desktop app doesn't display translations — don't request them
- Either: desktop sends
single_language_mode=true override in WebSocket params, or backend checks source=desktop and skips translation
- Remove unused
google-cloud-translate from pusher/requirements.txt (dead dependency)
Impact
- Cost: Major reduction in GCP Translation API spend — most users are monolingual and currently trigger the coordinator unnecessarily
- UX: No degradation — users who need translation get it on-demand with the same seamless experience
- Risk: Users who currently rely on always-on translation will need to re-enable it after the default change (one-time)
by AI for @beastoin
Translation via the
/v4/listenAPI is the largest translation cost driver. Currently every user with a primary language set gets real-time Google Cloud Translation API calls on every conversation, even monolingual speakers. This issue proposes 5 changes to cut costs while keeping translation seamless for users who need it.Current Behavior
single_language_modedefaults tofalse. Any user who sets a primary language during onboarding gets auto-translation active on every listen session.TranslationCoordinatorprocesses every segment through language detection → classification → GCP Translation API ($15/M chars). Monolingual gate skips most calls after 4 consecutive same-language detections, but the coordinator is instantiated for every session.languageparam to backend, backend translates and persists, but desktop never displays translations — wasted API calls.Expected Behavior
Smart, cost-efficient translation that's seamless when users need it but doesn't burn API calls when they don't.
Affected Areas
backend/routers/transcribe.py:318-328single_language_mode=falseand language preference existsbackend/utils/translation_coordinator.pybackend/utils/translation.py_client.translate_text()callsbackend/utils/translation_cache.pybackend/routers/users.py:525-534single_language_modebased on language support, not user choiceapp/lib/pages/settings/language_settings_page.dartapp/lib/services/sockets/transcription_service.dartapp/lib/providers/capture_provider.dartapp/lib/widgets/transcript.dartSolution
1. Default auto-translate OFF
single_language_modedefault totruefor new users2. Per-conversation toggle in live conversation UI
3. Batch translation for on-demand activation
translate_units_batch()(already exists) for efficient batching instead of per-segment real-time calls4. Screen-aware deferral
5. Desktop: skip translation entirely
single_language_mode=trueoverride in WebSocket params, or backend checkssource=desktopand skips translationgoogle-cloud-translatefrompusher/requirements.txt(dead dependency)Impact
by AI for @beastoin