Skip to content

fix: S3 HTTP client ignores HTTP_PROXY/HTTPS_PROXY environment variables #6638

@alexeigor

Description

@alexeigor

Problem

Daft's S3 HTTP client silently ignores standard proxy environment variables (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY, NO_PROXY). This prevents users behind corporate proxies from accessing S3-compatible storage through Daft.

Affected file: src/daft-io/src/s3_like.rs (lines 564–572)

Root Cause

The default_client() function uses aws_smithy_http_client::Builder::build_https() — the high-level API. Internally, this creates a ConnectorBuilder but never sets proxy_config, which defaults to ProxyConfig::disabled():

// Current code in Daft
fn default_client() -> SharedHttpClient {
    Builder::new()
        .tls_provider(Provider::Rustls(CryptoMode::AwsLc))
        .build_https()
}

Inside aws-smithy-http-client, ConnectorBuilder::build_https() does:

let proxy_config = self.proxy_config.clone()
    .unwrap_or_else(proxy::ProxyConfig::disabled);  // explicitly disabled!

The lower-level ConnectorBuilder API does support proxies via .proxy_config(ProxyConfig::from_env()), which reads the standard env vars — but Builder never wires it up.

Call chain

  1. Daft calls Builder::new().tls_provider(...).build_https()
  2. Builder::build_https() internally creates a ConnectorBuilder via new_conn_builder()
  3. new_conn_builder() sets TLS provider and timeout settings, but never calls .proxy_config()
  4. ConnectorBuilder::build_https() defaults proxy_config to ProxyConfig::disabled()
  5. All proxy env vars are ignored

Proposed Fix

Replace Builder with the lower-level Connector::builder() (i.e., ConnectorBuilder) and add .proxy_config(ProxyConfig::from_env()). Use http_client_fn from aws_smithy_runtime_api to wrap the result into a SharedHttpClient:

fn default_client() -> SharedHttpClient {
    use std::sync::OnceLock;
    use aws_smithy_http_client::{
        Connector,
        proxy::ProxyConfig,
        tls::{Provider, rustls_provider::CryptoMode},
    };
    use aws_smithy_runtime_api::client::http::{
        http_client_fn, SharedHttpConnector,
    };

    let proxy_config = ProxyConfig::from_env();
    let cached: OnceLock<SharedHttpConnector> = OnceLock::new();

    http_client_fn(move |settings, _components| {
        cached
            .get_or_init(|| {
                let connector = Connector::builder()
                    .proxy_config(proxy_config.clone())
                    .connector_settings(settings.clone())
                    .tls_provider(Provider::Rustls(CryptoMode::AwsLc))
                    .build();
                SharedHttpConnector::new(connector)
            })
            .clone()
    })
}

Why this works

  • ProxyConfig::from_env() reads HTTP_PROXY, HTTPS_PROXY, ALL_PROXY, NO_PROXY (and lowercase variants). When none are set, behavior is identical to the current disabled behavior
  • OnceLock caches the connector to preserve connection pooling
  • .connector_settings(settings.clone()) forwards the SDK's connect/read timeout settings, matching original Builder::build_https() behavior
  • No new crate dependencies — all types are already available in existing dependencies (aws_smithy_http_client and aws_smithy_runtime_api)

Context

  • Other DataFrame libraries (e.g., Polars) work behind proxies because they use reqwest (via object_store), which respects proxy env vars by default
  • The aws-smithy-http-client crate (v1.1.11) fully supports proxy configuration — it's just not exposed through the high-level Builder API that Daft uses
  • This is a single-function, ~15-line change with no new dependencies

Environment variables that would be respected after fix

Variable Purpose
HTTP_PROXY / http_proxy Proxy for HTTP traffic
HTTPS_PROXY / https_proxy Proxy for HTTPS traffic
ALL_PROXY / all_proxy Proxy for all traffic
NO_PROXY / no_proxy Comma-separated bypass rules (e.g., localhost,*.internal)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions