Skip to content

fix(backend): handle Google Calendar OAuth token failures with retry and structured errors#6777

Open
DanKunleLove wants to merge 2 commits intoBasedHardware:mainfrom
DanKunleLove:fix/calendar-token-failures
Open

fix(backend): handle Google Calendar OAuth token failures with retry and structured errors#6777
DanKunleLove wants to merge 2 commits intoBasedHardware:mainfrom
DanKunleLove:fix/calendar-token-failures

Conversation

@DanKunleLove
Copy link
Copy Markdown

Summary

Addresses issue #6556 — chronic Google Calendar integration failures (58 errors/week since Dec 2025).

Root cause

The error handling in calendar_tools.py relied on fragile string matching ("Authentication failed" in error_msg or "401" in error_msg), missing:

  • Token revocation with non-401 codes (invalid_grant)
  • Rate limits (429)
  • Network errors / timeouts
  • Server errors (5xx)

Changes

google_utils.py:

  • Added GoogleAPIError exception class with status_code property and helpers (is_auth_error, is_rate_limit, is_permission_error, is_retryable)
  • Added automatic retry with exponential backoff for transient failures (429, 5xx, network timeouts)
  • Respects Retry-After header on 429 responses
  • refresh_google_token() now detects invalid_grant in refresh response body (token revoked by user in Google account settings)
  • Added specific error logging with sanitize() per repo logging rules

calendar_tools.py:

  • Replaced all 5 string-matching error handlers (get, create, delete-by-id, delete-search, update) with structured GoogleAPIError catching
  • Added explicit httpx.TimeoutException / httpx.ConnectError handling with user-friendly messages
  • Added 403 permission error handling consistently across all tools

Impact

  • Transient failures (rate limits, server hiccups) are automatically retried instead of immediately failing
  • Token revocation is properly detected regardless of HTTP status code
  • Users get actionable error messages ("Please reconnect your Google Calendar") instead of generic errors
  • Error logging includes status codes for easier debugging in GCP Error Reporting

Test plan

  • Verify existing calendar tool tests still pass (bash test.sh)
  • Confirm GoogleAPIError correctly classifies 401, 403, 429, 5xx status codes
  • Verify retry logic respects _MAX_RETRIES and exponential backoff
  • Confirm refresh_google_token detects invalid_grant in response body
  • Manual test: calendar operations work normally on the happy path

Fixes #6556

…and structured errors

Addresses chronic 58 errors/week in calendar_tools.py (since Dec 2025):

- Add GoogleAPIError exception class with status_code for structured
  error handling instead of fragile string matching on "401"/"Authentication failed"
- Add automatic retry with exponential backoff for transient failures
  (429 rate limits, 5xx server errors, network timeouts)
- Respect Retry-After header on 429 responses
- Detect token revocation (invalid_grant) in refresh_google_token(),
  not just 401 status codes
- Handle httpx.TimeoutException and httpx.ConnectError explicitly
  with user-friendly messages
- Add specific logging per error type for observability
- Use sanitize() from log_sanitizer for error body logging per repo rules
- Apply same fixes to all 5 calendar tool error handlers (get, create,
  delete, delete-search, update)

Fixes BasedHardware#6556
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 17, 2026

Greptile Summary

This PR replaces fragile string-matching error handling across all 5 Google Calendar tools with a structured GoogleAPIError exception class and adds automatic exponential-backoff retry for transient failures (429, 5xx, network timeouts). The refresh_google_token function gains invalid_grant detection and improved diagnostics aligned with the repo's sanitize() logging rules.

Confidence Score: 5/5

Safe to merge — all findings are P2 style/cleanup issues with no runtime impact on the current configuration.

The core logic is correct: error classification is accurate, retry paths are sound, and the structured exception integrates cleanly with all 5 tool handlers. The three findings are cosmetic (corrupted emoji), dead code (unreachable final raise), and an overlooked in-function import — none affect runtime behaviour with the current _MAX_RETRIES = 3 constant.

The post-loop raise GoogleAPIError(r.status_code, ...) in google_utils.py and the remaining import traceback in calendar_tools.py outer handler are the two lines worth a quick follow-up.

Important Files Changed

Filename Overview
backend/utils/retrieval/tools/google_utils.py New GoogleAPIError class + retry logic with exponential backoff; dead code at end of google_api_request references potentially unbound r, and a corrupted emoji appears in the server-error log message.
backend/utils/retrieval/tools/calendar_tools.py All 5 error handlers migrated from fragile string-matching to structured GoogleAPIError catching with explicit network-error paths; one stale import traceback / traceback.print_exc() remains in the outer handler of get_calendar_events_tool.

Sequence Diagram

sequenceDiagram
    participant CT as calendar_tools
    participant GU as google_api_request
    participant GA as Google API

    CT->>GU: request (attempt 0)
    GU->>GA: HTTP request
    alt 200 OK
        GA-->>GU: 200
        GU-->>CT: response JSON
    else 429 / 5xx retryable attempt < MAX
        GA-->>GU: 429/5xx
        GU->>GU: exponential backoff sleep
        GU->>GA: HTTP request (retry)
        GA-->>GU: 200
        GU-->>CT: response JSON
    else 429 / 5xx all retries exhausted
        GA-->>GU: 429/5xx
        GU-->>CT: raise GoogleAPIError(429/5xx)
        CT->>CT: else branch → user-facing error msg
    else 401 / invalid_grant
        GA-->>GU: 401
        GU-->>CT: raise GoogleAPIError(401)
        CT->>CT: is_auth_error → refresh_google_token
        CT->>GA: retry with new token
        GA-->>CT: 200 or error
    else 403
        GA-->>GU: 403
        GU-->>CT: raise GoogleAPIError(403)
        CT->>CT: is_permission_error → reconnect message
    else Network timeout / ConnectError
        GU->>GU: retry up to MAX_RETRIES
        GU-->>CT: raise httpx.TimeoutException / ConnectError
        CT->>CT: network error handler → try again message
    end
Loading

Comments Outside Diff (1)

  1. backend/utils/retrieval/tools/calendar_tools.py, line 725-730 (link)

    P2 In-function import traceback not removed

    The PR correctly removed import traceback / traceback.print_exc() from all inner handlers, but the outer except block of get_calendar_events_tool still has one. The repo rule requires all imports at module top level, not inside functions.

    Context Used: Backend Python import rules - no in-function impor... (source)

Reviews (1): Last reviewed commit: "fix(backend): handle Google Calendar OAu..." | Re-trigger Greptile

Comment on lines +189 to +192
# All retries exhausted
if last_error and isinstance(last_error, (httpx.TimeoutException, httpx.ConnectError)):
raise last_error
raise GoogleAPIError(r.status_code, sanitize(r.text[:200]) if r.text else "No error body")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead code with potentially unbound r

The final raise GoogleAPIError(r.status_code, ...) on line 192 is unreachable: the loop can only exit without raising internally if every iteration took a continue path. The last attempt (attempt 2) can only continue via the network-error handlers — which always set last_error — so the if last_error guard above will always fire first. As written, r is never assigned by a network-error branch, making this line both dead and referencing a potentially unbound name. If _MAX_RETRIES is ever reduced to 0 (empty loop), last_error is also unbound and a NameError surfaces instead of a GoogleAPIError.

Suggested change
# All retries exhausted
if last_error and isinstance(last_error, (httpx.TimeoutException, httpx.ConnectError)):
raise last_error
raise GoogleAPIError(r.status_code, sanitize(r.text[:200]) if r.text else "No error body")
# All retries exhausted
if last_error and isinstance(last_error, (httpx.TimeoutException, httpx.ConnectError)):
raise last_error
# Unreachable with _MAX_RETRIES >= 1, but kept as a safety net
raise GoogleAPIError(0, "All retries exhausted with no response")

Comment on lines +180 to +182
logger.warning(
f"�� Server error {r.status_code}, retrying in {delay}s (attempt {attempt + 1}/{_MAX_RETRIES})"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Corrupted emoji in log message

The emoji has been mangled to �� in the server-error retry warning, which will appear as garbled characters in GCP logs.

Suggested change
logger.warning(
f"�� Server error {r.status_code}, retrying in {delay}s (attempt {attempt + 1}/{_MAX_RETRIES})"
)
logger.warning(
f"🌐 Server error {r.status_code}, retrying in {delay}s (attempt {attempt + 1}/{_MAX_RETRIES})"
)

- Fix corrupted emoji in server-error retry log message
- Replace potentially unbound `r` reference in post-loop raise with
  safe fallback `GoogleAPIError(0, "All retries exhausted with no response")`
- Remove remaining in-function `import traceback` / `traceback.print_exc()`
  from outer exception handlers per repo import rules
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Google Calendar integration: chronic token failures (58 errors/week since Dec 2025)

1 participant