Skip to content

Move event log configuration into context-based logging_config #6720

@universalmind303

Description

@universalmind303

Is your feature request related to a problem?

enable_event_log() / disable_event_log() (in daft/subscribers/event_log.py lines 385–418) are the only piece of user-facing Daft config that doesn't fit the existing context-config family.

Daft already has nested, typed config objects on DaftContext:

  • daft.get_context().daft_planning_config (property) + daft.set_planning_config(...) (kwargs setter) + daft.planning_config_ctx(...) (scoped context manager)
  • daft.get_context().daft_execution_config (property) + daft.set_execution_config(...) + daft.execution_config_ctx(...)

Event log configuration instead uses module-level globals (_EVENT_LOG_SUBSCRIBER, _EVENT_LOG_ATEXIT_REGISTERED) and an imperative enable/disable toggle. the impact:

  • No ctx-scoped scoping analog for tests or temporary enablement
  • No single place to look for "what is this Daft session configured to do?"
  • Singleton + atexit behavior is hidden; switching log dirs after enable has subtle ordering requirements
  • Future observability config (metrics port, additional subscribers, sidecar knobs) will either compound this asymmetry or bolt on a third pattern

This is the same shape pattern that Ray Data uses for DataContext.get_current().execution_options.<field> = ... and that Flink uses for env.getCheckpointConfig().setX(...). Spark's flat dotted-string SparkConf is the counter-example, and isn't where newer systems (including Daft) have been heading.

Describe the solution you'd like

Add logging_config as a typed, nested config object on DaftContext, mirroring the existing planning/execution config family exactly.

# Attribute access (property on DaftContext)
daft.get_context().logging_config.event_log_dir = "~/.daft/events"
daft.get_context().logging_config.event_log_dir = None  # disable

# Kwargs setter (mirrors set_planning_config / set_execution_config)
daft.set_logging_config(event_log_dir="~/.daft/events")

# Scoped context manager (mirrors planning_config_ctx / execution_config_ctx)
with daft.logging_config_ctx(event_log_dir=tmp_path):
    df.collect()

Semantics:

  • Field assignment takes effect immediately: setting event_log_dir to a path synchronously attaches the EventLogSubscriber under its canonical alias; setting to None synchronously detaches and closes.
  • Env var (DAFT_EVENT_LOG_DIR) remains supported as the default when no value is set; explicit assignment overrides env.
  • atexit cleanup owned by the context/config layer, not a module global.
  • Re-configuration (assigning a new path over an existing one) closes the old subscriber and opens the new one in order.

Migration:

  • enable_event_log() / disable_event_log() become thin shims that delegate to the new API and emit DeprecationWarning.
  • Remove after one release cycle.

Probably also: consolidate subscriber machinery under logging_config

While we're here, we should probably also move the rest of the subscriber-related surface off the top-level DaftContext and onto logging_config. Today DaftContext carries:

  • Public subscriber management: attach_subscriber(alias, subscriber), detach_subscriber(alias)
  • Private dispatch hooks called from engine internals: _notify_query_start, _notify_query_heartbeat, _notify_query_end, _notify_optimization_start, _notify_optimization_end, _notify_exec_start, _notify_exec_end, _notify_exec_operator_start, _notify_exec_operator_end, _notify_exec_emit_stats, _notify_result_out

That's 13 methods on DaftContext whose concern is entirely "subscribers / event dispatch," mixed with the planning/execution-config concerns that DaftContext should actually own.

Proposed shape:

ctx = daft.get_context()

# Public API moves
ctx.logging_config.attach_subscriber(alias, subscriber)
ctx.logging_config.detach_subscriber(alias)

# Private dispatch hooks also move (renamed without the leading underscore since
# they'd become internal-to-logging-config, not internal-to-DaftContext)
ctx.logging_config.notify_query_start(query_id, metadata)
# ... etc

Call sites in daft.runners.* / query managers update to ctx.logging_config.notify_*(...). The top-level DaftContext shrinks to just planning/execution/logging config properties + _from_native / __init__, which matches what the name "context" actually means.

Framing this as "probably also" because it's a larger refactor than the event-log surface alone — the blast radius is every call site of _notify_* in the engine. If reviewers prefer to land the logging_config + event-log work first and do this consolidation as a follow-up issue, that's fine; flagging it now so the logging_config design is shaped to accommodate it (e.g., the attach_subscriber / detach_subscriber methods are planned for from day one rather than retrofitted).

Describe alternatives you've considered

Additional Context

  • Existing context setters: daft/context.py lines 168–337 (set_planning_config, set_execution_config, and their _ctx context managers).
  • Current event log helpers: daft/subscribers/event_log.py lines 385–418.
  • Ecosystem precedent for the property-mutation shape: Ray Data's DataContext.get_current().execution_options.<field> = ....

Would you like to implement a fix?

Yes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions