Skip to content

FEAT: Add ActiveDirectoryServicePrincipal support for bulk copy#576

Draft
bewithgaurav wants to merge 1 commit into
mainfrom
bewithgaurav/gh534-bulkcopy-service-principal
Draft

FEAT: Add ActiveDirectoryServicePrincipal support for bulk copy#576
bewithgaurav wants to merge 1 commit into
mainfrom
bewithgaurav/gh534-bulkcopy-service-principal

Conversation

@bewithgaurav
Copy link
Copy Markdown
Collaborator

Summary

Adds Authentication=ActiveDirectoryServicePrincipal support for bulk copy. Partial fix for #534 — addresses the exact error in the issue report ("Authentication method 'ActiveDirectoryServicePrincipal' is not supported. No token provider was registered for this method.").

Why a callback rather than pre-acquire

The other Entra ID modes already supported in bulk copy (Default, DeviceCode, Interactive) pre-acquire a JWT in Python and pass it as access_token. ServicePrincipal can't work that way:

  • ClientSecretCredential(tenant_id, client_id, client_secret) requires tenant_id as a constructor argument.
  • The connection string doesn't carry tenant_id; azure-identity doesn't discover it from anywhere.
  • The only place tenant is available client-side is the STS URL the server hands back during the FedAuth pre-login (this is what ODBC does internally — that's why ODBC has never needed tenant_id on the connection string).

So bulk copy must register a callback that's invoked mid-handshake. mssql-py-core hands the callback (spn, sts_url, auth_method) and we parse the tenant from the STS URL on every call.

Saurabh approved the callback approach in the EXP--Drivers--Python thread (May 13 2026). The matching mssql-py-core change (Rust-side wiring of the callback into auth_method_map) is in mssql-rs PR and ships in mssql-py-core 0.1.5+.

What changes

File Change
mssql_python/constants.py Add AuthType.SERVICE_PRINCIPAL = "activedirectoryserviceprincipal".
mssql_python/auth.py Add ServicePrincipalAuth.make_token_factory(client_id, client_secret) that returns the callable, plus _parse_tenant_id(). Update process_auth_parameters() to detect SP (and leave ODBC's native query path alone), update extract_auth_type() to propagate "serviceprincipal" for bulkcopy.
mssql_python/cursor.py Bulkcopy: when _auth_type == "serviceprincipal", build the factory from connection-string UID/PWD and register it via the new entra_id_token_factory dict key. Keep authentication/user_name/password in pycore_context (the auth validator + transformer in py-core need them to resolve the method before the factory is dispatched). Existing Model A flow for Default/DeviceCode/Interactive untouched.
tests/test_008_auth.py 17 new tests.
CHANGELOG.md Unreleased entry.

Flow

mssql-python                          mssql-py-core (Rust)             mssql-tds (Rust)
────────────                          ──────────────────────           ────────────────
cursor.bulkcopy()
  ├─ parses conn string (UID/PWD)
  ├─ ServicePrincipalAuth
  │   .make_token_factory(uid, pwd)
  │      → returns _factory closure
  │
  └─ pycore_context = {
       "authentication": "AD-SP",
       "user_name":  client_id,
       "password":   client_secret,
       "entra_id_token_factory": _factory,
       ...
     }
                                      PyCoreConnection(ctx)
                                        ├─ validate_auth/transform_auth
                                        ├─ register factory in
                                        │  auth_method_map[SP]
                                        └─ open TCP/TLS ──────────────►  TDS pre-login
                                                                          ├─ server: FedAuthInfo
                                                                          │  (spn, sts_url)
                                                                          ├─ context.entra_id_
                                                                          │     token_factory()
                                      ◄────────── invoked ──────────────  ├─ factory.create_token
_factory(spn, sts_url, auth_method)                                       │     (spn, sts_url, m)
  ├─ _parse_tenant_id(sts_url)
  ├─ ClientSecretCredential(
  │     tenant_id, client_id,
  │     client_secret)
  ├─ .get_token(scope=spn + "/.default")
  └─ return jwt.encode("utf-16-le")
                                                                          └─ FedAuth token packet
                                                                                       ──────────►
                                                                                                  SQL Server
                                                                                                  validates

Test

$ python -m pytest tests/test_008_auth.py -q
65 passed in 0.77s

17 new tests cover:

  • TestAuthType::test_auth_type_constants extended for SERVICE_PRINCIPAL
  • TestProcessAuthParameters::test_service_principal_auth_leaves_odbc_path_alone, test_service_principal_auth_case_insensitive
  • TestExtractAuthType::test_serviceprincipal
  • TestParseTenantId: GUID, GUID without trailing slash, domain, query string, extra path segments, empty string, no path
  • TestServicePrincipalAuth: factory returns callable, requires non-empty client_id and client_secret, returns UTF-16LE bytes, forwards credentials to ClientSecretCredential, builds scope from SPN (and keeps /.default suffix when already present), errors on unparseable STS URL, propagates ClientAuthenticationError as RuntimeError

Connection string example

import mssql_python

conn = mssql_python.connect(
    "Server=tcp:myserver.database.windows.net;"
    "Database=mydb;"
    "Authentication=ActiveDirectoryServicePrincipal;"
    "UID=<client-id-guid>;"
    "PWD=<client-secret>;"
    "Encrypt=yes;"
)
cur = conn.cursor()
cur.bulkcopy("mytable", [(1, "a"), (2, "b")])  # uses the callback under the hood

Same connection string keeps working for regular queries (ODBC handles ServicePrincipal natively in the non-bulk-copy path).

Out of scope

  • ActiveDirectoryPassword — needs the same callback shape but a different azure-identity credential class. Separate PR; py-core still rejects it today with a clear error.
  • ActiveDirectoryIntegrated — needs SSPI/Kerberos wiring at the Rust layer, not a Python callback. Separate work.

Dependencies

  • Requires mssql-py-core 0.1.5+ (mssql-rs PR dev/gaurav/entra-id-token-factory-py-callback). This Python PR will fail at import or runtime against earlier py-core versions because the entra_id_token_factory dict key is silently ignored there. Will update the dependency pin once the mssql-rs PR merges and a new wheel is published.

Copilot AI review requested due to automatic review settings May 13, 2026 12:14
@bewithgaurav bewithgaurav changed the title Add ActiveDirectoryServicePrincipal support for bulk copy FEAT: Add ActiveDirectoryServicePrincipal support for bulk copy May 13, 2026
@bewithgaurav bewithgaurav marked this pull request as draft May 13, 2026 12:16
Wires a Python token-factory callback into the mssql-py-core connection
context so bulk copy can authenticate with `Authentication=
ActiveDirectoryServicePrincipal`. The callback is invoked by mssql-tds
mid-handshake (FedAuth workflow 0x02), receives the STS URL from the
server, parses tenant_id from it, and uses azure-identity's
ClientSecretCredential to acquire a JWT — matching what ODBC does for
the query path.

Why a callback rather than pre-acquire (Model A):
- ServicePrincipal needs `tenant_id` to build ClientSecretCredential.
- The connection string does not carry it; azure-identity does not
  discover it. The only place we can learn tenant_id client-side is
  from the STS URL the server hands back during pre-login (which is
  exactly what ODBC does internally).
- A pre-acquire flow therefore can't work. Saurabh approved the
  callback model on EXP--Drivers--Python (May 13 2026).

Scope: ServicePrincipal only. ActiveDirectoryPassword and
ActiveDirectoryIntegrated remain on their existing code paths (still
rejected in py-core today with a clear error). Separate follow-ups.

Changes:
- constants.py: Add AuthType.SERVICE_PRINCIPAL.
- auth.py:
  - Add ServicePrincipalAuth.make_token_factory(client_id, secret)
    that returns a (spn, sts_url, auth_method) -> bytes callable
    matching the entra_id_token_factory contract py-core expects.
  - Add _parse_tenant_id() helper.
  - process_auth_parameters: detect SP and return auth_type=None
    (let ODBC handle the query path natively, msodbcsql 17.3+ has
    native SP support).
  - extract_auth_type: propagate "serviceprincipal" so bulkcopy can
    distinguish the SP path.
- cursor.py bulkcopy: when _auth_type=="serviceprincipal", build the
  factory from connection-string UID/PWD and register it via the new
  entra_id_token_factory dict key. Keep authentication/user_name/
  password in pycore_context (py-core's auth validator + transformer
  need them to resolve the method to ActiveDirectoryServicePrincipal
  before the factory is dispatched). Existing Default/DeviceCode/
  Interactive (Model A) flow unchanged.

Requires mssql-py-core 0.1.5+ which wires the entra_id_token_factory
dict key into ClientContext.auth_method_map.

Tests: 17 new in test_008_auth.py covering tenant parsing
(GUID/domain/query-string/trailing-slash/empty/etc), credential kwarg
forwarding, scope construction, UTF-16LE encoding, and error paths
(missing client_id/secret, bad STS URL, auth failure propagation).

Partial fix for #534.
@bewithgaurav bewithgaurav force-pushed the bewithgaurav/gh534-bulkcopy-service-principal branch from 2e81b7f to cd80d9c Compare May 13, 2026 12:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds bulk copy support for Authentication=ActiveDirectoryServicePrincipal by registering an Entra ID token-factory callback (invoked mid-handshake) so the tenant can be derived from the STS URL returned by the server, aligning bulk copy behavior with ODBC’s native Service Principal handling for the normal query path.

Changes:

  • Added AuthType.SERVICE_PRINCIPAL and updated auth-type extraction/processing to recognize Service Principal mode.
  • Implemented ServicePrincipalAuth.make_token_factory() and _parse_tenant_id() to generate UTF-16LE JWTs during FedAuth handshake.
  • Updated cursor.bulkcopy() to register entra_id_token_factory for Service Principal auth; added tests and a changelog entry.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
mssql_python/constants.py Adds SERVICE_PRINCIPAL enum value for connection-string auth detection.
mssql_python/auth.py Adds tenant parsing + token-factory callback for Service Principal; updates auth detection/mapping.
mssql_python/cursor.py Registers the callback for bulkcopy when _auth_type == "serviceprincipal"; preserves credentials for py-core auth resolution.
tests/test_008_auth.py Adds unit tests for tenant parsing, factory behavior, and new auth-type detection.
CHANGELOG.md Documents new bulk copy Service Principal support and notes py-core requirement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mssql_python/cursor.py
Comment on lines +2951 to +2964
try:
factory = ServicePrincipalAuth.make_token_factory(
client_id, client_secret
)
except (RuntimeError, ValueError) as e:
raise RuntimeError(
f"Bulk copy failed: unable to build ServicePrincipal "
f"token factory: {e}"
) from e
pycore_context["entra_id_token_factory"] = factory
# Keep authentication/user_name/password in pycore_context —
# py-core's auth validator + transformer need them to resolve
# the auth method to ActiveDirectoryServicePrincipal before
# the factory is dispatched at handshake time.
Comment thread mssql_python/cursor.py
Comment on lines +2966 to +2967
"Bulk copy: registered ServicePrincipal token factory for client_id=%s",
client_id,
Comment thread mssql_python/auth.py
Comment on lines +147 to +155
try:
parsed = urlparse(sts_url)
except (ValueError, AttributeError):
return None
path = (parsed.path or "").strip("/")
if not path:
return None
first_segment = path.split("/", 1)[0]
return first_segment or None
Comment thread mssql_python/cursor.py
Comment on lines +2960 to +2964
pycore_context["entra_id_token_factory"] = factory
# Keep authentication/user_name/password in pycore_context —
# py-core's auth validator + transformer need them to resolve
# the auth method to ActiveDirectoryServicePrincipal before
# the factory is dispatched at handshake time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants