Is your feature request related to a problem?
Daft supports counting distinct values at the Expression level (col("x").count_distinct()), but there is no DataFrame.count_distinct() or GroupedDataFrame.count_distinct() convenience method. This is inconsistent with other aggregation methods like sum(), mean(), min(), and max(), which all provide both global and grouped convenience methods. Users have to write df.agg(col("x").count_distinct()) instead of the more intuitive df.count_distinct("x").
Describe the solution you'd like
Add DataFrame.count_distinct() and GroupedDataFrame.count_distinct() convenience methods that mirror the existing aggregation API pattern.
# Global count distinct
df.count_distinct("a")
# Multiple columns
df.count_distinct("a", "b")
# Grouped count distinct
df.groupby("key").count_distinct("a")
Describe alternatives you've considered
No response
Additional Context
No response
Would you like to implement a fix?
Yes
Is your feature request related to a problem?
Daft supports counting distinct values at the Expression level (
col("x").count_distinct()), but there is noDataFrame.count_distinct()orGroupedDataFrame.count_distinct()convenience method. This is inconsistent with other aggregation methods likesum(),mean(),min(), andmax(), which all provide both global and grouped convenience methods. Users have to writedf.agg(col("x").count_distinct())instead of the more intuitivedf.count_distinct("x").Describe the solution you'd like
Add
DataFrame.count_distinct()andGroupedDataFrame.count_distinct()convenience methods that mirror the existing aggregation API pattern.Describe alternatives you've considered
No response
Additional Context
No response
Would you like to implement a fix?
Yes