Skip to content

feat: accept distinct kwarg on sum, avg, and count#1556

Draft
timsaucer wants to merge 1 commit into
apache:mainfrom
timsaucer:feat/df54-followups-wave2
Draft

feat: accept distinct kwarg on sum, avg, and count#1556
timsaucer wants to merge 1 commit into
apache:mainfrom
timsaucer:feat/df54-followups-wave2

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

Related to #1533

Rationale for this change

Upstream DataFusion now supports the distinct variant for these functions. We expose this functionality here.

What changes are included in this PR?

  • Enhance sum, avg, and count to take the distinct keyword.
  • Add unit tests
  • Update check-upstream skill so that these are known, already covered functions.

Are there any user-facing changes?

Addition only. Prior code is not impacted.

Upstream exposes `sum_distinct` / `avg_distinct` / `count_distinct` as
sibling functions that call the same underlying UDAF with
`distinct: bool = true`. The Rust binding side already routes
`distinct=Some(true)` through the aggregate builder for `sum`, `avg`,
and `count` — but only `count` exposed the kwarg on the Python wrapper.

Add `distinct: bool = False` to `sum()` and `avg()` mirroring the
existing `count()` signature, and update SKILL.md so the check-upstream
audit does not re-flag the three upstream `*_distinct` shortcuts as
gaps. The plan emitted by `sum(col, distinct=True)` matches what
upstream's `sum_distinct(col)` builds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant