Skip to content

Add DataFrame.count_distinct() and GroupedDataFrame.count_distinct() aggregation methods #6657

@kerwin-zk

Description

@kerwin-zk

Is your feature request related to a problem?

Daft supports counting distinct values at the Expression level (col("x").count_distinct()), but there is no DataFrame.count_distinct() or GroupedDataFrame.count_distinct() convenience method. This is inconsistent with other aggregation methods like sum(), mean(), min(), and max(), which all provide both global and grouped convenience methods. Users have to write df.agg(col("x").count_distinct()) instead of the more intuitive df.count_distinct("x").

Describe the solution you'd like

Add DataFrame.count_distinct() and GroupedDataFrame.count_distinct() convenience methods that mirror the existing aggregation API pattern.

# Global count distinct
df.count_distinct("a")

# Multiple columns
df.count_distinct("a", "b")

# Grouped count distinct
df.groupby("key").count_distinct("a")

Describe alternatives you've considered

No response

Additional Context

No response

Would you like to implement a fix?

Yes

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions