Skip to content

Honor enable_endpoint_discovery=False in _GetDatabaseAccount fallback#46890

Open
andrewmathew1 wants to merge 5 commits into
Azure:mainfrom
andrewmathew1:andrewmathew1/cosmos-strict-endpoint-discovery-disable
Open

Honor enable_endpoint_discovery=False in _GetDatabaseAccount fallback#46890
andrewmathew1 wants to merge 5 commits into
Azure:mainfrom
andrewmathew1:andrewmathew1/cosmos-strict-endpoint-discovery-disable

Conversation

@andrewmathew1
Copy link
Copy Markdown
Contributor

When the initial database-account read against the user-supplied endpoint fails, the global endpoint manager (sync and async) would fall back to trying synthesized public regional endpoints derived from PreferredLocations -- regardless of whether the caller had set enable_endpoint_discovery=False. For Cosmos accounts reachable only via a private endpoint, those synthesized regional FQDNs are not always present in the privatelink.documents.azure.com private DNS zone, so they resolve via public DNS to the public IP of the account and are rejected by the firewall with a 403. Because _GetDatabaseAccount is called on startup and every 5 minutes by the background refresh task, this surfaced as intermittent 403s.

This change gates the fallback with EnableEndpointDiscovery: when discovery is disabled, the original exception is re-raised and no locational endpoints are tried. With discovery enabled the existing behavior is preserved.

Fixes #46219

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copilot AI review requested due to automatic review settings May 14, 2026 16:18
@andrewmathew1 andrewmathew1 requested a review from a team as a code owner May 14, 2026 16:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes azure-cosmos endpoint discovery behavior so enable_endpoint_discovery=False prevents _GetDatabaseAccount from falling back to synthesized regional endpoints, aligning private-endpoint behavior with the documented contract.

Changes:

  • Gates sync and async _GetDatabaseAccount regional fallback behind EnableEndpointDiscovery.
  • Updates public/client documentation and changelog to describe the private-endpoint fix.
  • Adds sync and async regression tests for disabled and enabled endpoint discovery behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/azure/cosmos/_global_endpoint_manager.py Skips locational fallback when endpoint discovery is disabled.
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_global_endpoint_manager_async.py Applies the same fallback guard for async clients.
sdk/cosmos/azure-cosmos/azure/cosmos/cosmos_client.py Documents enable_endpoint_discovery=False endpoint pinning behavior.
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client.py Documents the async client behavior.
sdk/cosmos/azure-cosmos/azure/cosmos/documents.py Expands ConnectionPolicy.EnableEndpointDiscovery documentation.
sdk/cosmos/azure-cosmos/CHANGELOG.md Adds an unreleased bug-fix entry for issue #46219.
sdk/cosmos/azure-cosmos/tests/test_endpoint_discovery_disabled.py Adds sync regression coverage.
sdk/cosmos/azure-cosmos/tests/test_endpoint_discovery_disabled_async.py Adds async regression coverage.

… fallback

When the initial database-account read against the user-supplied endpoint fails, the global endpoint manager (sync and async) would fall back to trying synthesized public regional endpoints derived from PreferredLocations -- regardless of whether the caller had set enable_endpoint_discovery=False. For Cosmos accounts reachable only via a private endpoint, those synthesized regional FQDNs are not always present in the privatelink.documents.azure.com private DNS zone, so they resolve via public DNS to the public IP of the account and are rejected by the firewall with a 403. Because _GetDatabaseAccount is called on startup and every 5 minutes by the background refresh task, this surfaced as intermittent 403s.

This change gates the fallback with EnableEndpointDiscovery: when discovery is disabled, the original exception is re-raised and no locational endpoints are tried. With discovery enabled the existing behavior is preserved.

Fixes Azure#46219

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@andrewmathew1 andrewmathew1 force-pushed the andrewmathew1/cosmos-strict-endpoint-discovery-disable branch from 6f96fc8 to fb863f8 Compare May 14, 2026 16:53
andrewmathew1 and others added 4 commits May 14, 2026 15:37
Replaces 'myaccount' with 'contoso' in the new endpoint-discovery-disabled

tests so they pass cspell. 'contoso' is the canonical Microsoft sample

name (already used by _location_cache.py's docstring for the same

endpoint-format example) and is in the repo's cspell allow-list, while

'myaccount' is not. No behavior change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…cenario

Constructs a real CosmosClient and mocks CosmosClientConnection.GetDatabaseAccount to record every URL the SDK attempts. Asserts that with enable_endpoint_discovery=False the SDK only contacts the user-supplied URL, and with discovery enabled it still falls back to synthesized regional FQDNs (regression check).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Removes test_private_endpoint_repro.py (which mocked the network) and adds test_endpoint_discovery_disabled_live.py which exercises the issue Azure#46219 contract against a real Cosmos account using the standard ACCOUNT_HOST and ACCOUNT_KEY env vars. A RecordingTransport (subclass of RequestsTransport) records every URL the SDK actually issues over the network.

On a real geo-replicated account the test demonstrates that with enable_endpoint_discovery=True the SDK contacts the synthesized regional FQDN (<account>-<region>.documents.azure.com) -- the exact host that produces the 403 in private-endpoint deployments whose private DNS zone is missing the regional entry. With discovery=False the SDK only contacts the supplied host.

Auto-skipped when the env points at the local emulator.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
test_endpoint_discovery_disabled.py and test_endpoint_discovery_disabled_async.py asserted the same contract against a stubbed _GetDatabaseAccountStub. The live test (test_endpoint_discovery_disabled_live.py) exercises the same behavior against a real Cosmos account using ACCOUNT_HOST/ACCOUNT_KEY, which is a stronger signal.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

azure-cosmos: SDK routes requests through public IP even with private endpoint and enable_endpoint_discovery=False

2 participants