fix(backend): handle Google Calendar OAuth token failures with retry and structured errors#6777
fix(backend): handle Google Calendar OAuth token failures with retry and structured errors#6777DanKunleLove wants to merge 2 commits intoBasedHardware:mainfrom
Conversation
…and structured errors Addresses chronic 58 errors/week in calendar_tools.py (since Dec 2025): - Add GoogleAPIError exception class with status_code for structured error handling instead of fragile string matching on "401"/"Authentication failed" - Add automatic retry with exponential backoff for transient failures (429 rate limits, 5xx server errors, network timeouts) - Respect Retry-After header on 429 responses - Detect token revocation (invalid_grant) in refresh_google_token(), not just 401 status codes - Handle httpx.TimeoutException and httpx.ConnectError explicitly with user-friendly messages - Add specific logging per error type for observability - Use sanitize() from log_sanitizer for error body logging per repo rules - Apply same fixes to all 5 calendar tool error handlers (get, create, delete, delete-search, update) Fixes BasedHardware#6556
Greptile SummaryThis PR replaces fragile string-matching error handling across all 5 Google Calendar tools with a structured Confidence Score: 5/5Safe to merge — all findings are P2 style/cleanup issues with no runtime impact on the current configuration. The core logic is correct: error classification is accurate, retry paths are sound, and the structured exception integrates cleanly with all 5 tool handlers. The three findings are cosmetic (corrupted emoji), dead code (unreachable final raise), and an overlooked in-function import — none affect runtime behaviour with the current _MAX_RETRIES = 3 constant. The post-loop raise GoogleAPIError(r.status_code, ...) in google_utils.py and the remaining import traceback in calendar_tools.py outer handler are the two lines worth a quick follow-up. Important Files Changed
Sequence DiagramsequenceDiagram
participant CT as calendar_tools
participant GU as google_api_request
participant GA as Google API
CT->>GU: request (attempt 0)
GU->>GA: HTTP request
alt 200 OK
GA-->>GU: 200
GU-->>CT: response JSON
else 429 / 5xx retryable attempt < MAX
GA-->>GU: 429/5xx
GU->>GU: exponential backoff sleep
GU->>GA: HTTP request (retry)
GA-->>GU: 200
GU-->>CT: response JSON
else 429 / 5xx all retries exhausted
GA-->>GU: 429/5xx
GU-->>CT: raise GoogleAPIError(429/5xx)
CT->>CT: else branch → user-facing error msg
else 401 / invalid_grant
GA-->>GU: 401
GU-->>CT: raise GoogleAPIError(401)
CT->>CT: is_auth_error → refresh_google_token
CT->>GA: retry with new token
GA-->>CT: 200 or error
else 403
GA-->>GU: 403
GU-->>CT: raise GoogleAPIError(403)
CT->>CT: is_permission_error → reconnect message
else Network timeout / ConnectError
GU->>GU: retry up to MAX_RETRIES
GU-->>CT: raise httpx.TimeoutException / ConnectError
CT->>CT: network error handler → try again message
end
|
| # All retries exhausted | ||
| if last_error and isinstance(last_error, (httpx.TimeoutException, httpx.ConnectError)): | ||
| raise last_error | ||
| raise GoogleAPIError(r.status_code, sanitize(r.text[:200]) if r.text else "No error body") |
There was a problem hiding this comment.
Dead code with potentially unbound
r
The final raise GoogleAPIError(r.status_code, ...) on line 192 is unreachable: the loop can only exit without raising internally if every iteration took a continue path. The last attempt (attempt 2) can only continue via the network-error handlers — which always set last_error — so the if last_error guard above will always fire first. As written, r is never assigned by a network-error branch, making this line both dead and referencing a potentially unbound name. If _MAX_RETRIES is ever reduced to 0 (empty loop), last_error is also unbound and a NameError surfaces instead of a GoogleAPIError.
| # All retries exhausted | |
| if last_error and isinstance(last_error, (httpx.TimeoutException, httpx.ConnectError)): | |
| raise last_error | |
| raise GoogleAPIError(r.status_code, sanitize(r.text[:200]) if r.text else "No error body") | |
| # All retries exhausted | |
| if last_error and isinstance(last_error, (httpx.TimeoutException, httpx.ConnectError)): | |
| raise last_error | |
| # Unreachable with _MAX_RETRIES >= 1, but kept as a safety net | |
| raise GoogleAPIError(0, "All retries exhausted with no response") |
| logger.warning( | ||
| f"�� Server error {r.status_code}, retrying in {delay}s (attempt {attempt + 1}/{_MAX_RETRIES})" | ||
| ) |
There was a problem hiding this comment.
Corrupted emoji in log message
The emoji has been mangled to �� in the server-error retry warning, which will appear as garbled characters in GCP logs.
| logger.warning( | |
| f"�� Server error {r.status_code}, retrying in {delay}s (attempt {attempt + 1}/{_MAX_RETRIES})" | |
| ) | |
| logger.warning( | |
| f"🌐 Server error {r.status_code}, retrying in {delay}s (attempt {attempt + 1}/{_MAX_RETRIES})" | |
| ) |
- Fix corrupted emoji in server-error retry log message - Replace potentially unbound `r` reference in post-loop raise with safe fallback `GoogleAPIError(0, "All retries exhausted with no response")` - Remove remaining in-function `import traceback` / `traceback.print_exc()` from outer exception handlers per repo import rules
Summary
Addresses issue #6556 — chronic Google Calendar integration failures (58 errors/week since Dec 2025).
Root cause
The error handling in
calendar_tools.pyrelied on fragile string matching ("Authentication failed" in error_msg or "401" in error_msg), missing:invalid_grant)Changes
google_utils.py:GoogleAPIErrorexception class withstatus_codeproperty and helpers (is_auth_error,is_rate_limit,is_permission_error,is_retryable)Retry-Afterheader on 429 responsesrefresh_google_token()now detectsinvalid_grantin refresh response body (token revoked by user in Google account settings)sanitize()per repo logging rulescalendar_tools.py:get,create,delete-by-id,delete-search,update) with structuredGoogleAPIErrorcatchinghttpx.TimeoutException/httpx.ConnectErrorhandling with user-friendly messages403permission error handling consistently across all toolsImpact
Test plan
bash test.sh)GoogleAPIErrorcorrectly classifies 401, 403, 429, 5xx status codes_MAX_RETRIESand exponential backoffrefresh_google_tokendetectsinvalid_grantin response bodyFixes #6556