fix: preserve precision in log() with mixed float-width arguments#23310
Draft
raphaelroshan wants to merge 1 commit into
Draft
fix: preserve precision in log() with mixed float-width arguments#23310raphaelroshan wants to merge 1 commit into
raphaelroshan wants to merge 1 commit into
Conversation
Previously log(base, value) computed and returned its result in the value's float type. When the base was a wider float (e.g. log(Float64, Float32)), the base was narrowed to the value's type, losing precision. The result type is now the widest float among the arguments, and a Float16/Float32 value is widened to that type before computation so the base is no longer narrowed. Closes apache#22581
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
log(base, val)loses precision whenvalueisFloat32#22581.Rationale for this change
log(base, value)currently dispatches on the type of the value (the last argument):return_typeinspects only the value's type, andinvoke_with_argscomputes the result in that type. When the base is a wider float than the value — e.g.log(Float64, Float32)— the base is narrowed down to the value's float type before the computation, which loses precision. This is the behavior reported in #22581.The fix picks the widest float among all arguments as the result type, and widens a
Float16/Float32value up to that type before computing, so the base no longer has to be narrowed.What changes are included in this PR?
return_typenow returns the widest float across all arguments instead of only looking at the value's type (via a smallfloat_rankhelper).invoke_with_args, when the base is wider than aFloat16/Float32value, the value is cast up to the result type before computation so the base is not narrowed. Decimal values are left untouched (still computed inf64).test_log_f64_base_f32_value) and sqllogictest cases inmath.sltcoveringlog(Float64, Float32),log(Float64, Float16), and integer-base cases.Are these changes tested?
Yes.
cargo test -p datafusion-functions --lib math::— green (includes the newtest_log_f64_base_f32_value).math.sltsqllogictests — green (includes new mixed-width cases and updated expected types for previously-narrowing cases).Are there any user-facing changes?
Yes — this is a user-facing behavior change to
log()return types in mixed-float-width cases, and I'd like maintainers to confirm the desired semantics.Previously the return type of
log(base, value)followed the value's type. Now it follows the widest float among all arguments. Concretely:log(Float64, Float32)now returnsFloat64(wasFloat32)log(Float64, Float16)now returnsFloat64(wasFloat16)log(2, arrow_cast(2.0, 'Float32'))(integer base) now returnsFloat64(wasFloat32)The unary
log(value)and same-width cases (log(Float64, Float64),log(Float32, Float32), etc.) are unchanged. Because output precision/width changes in the mixed cases, some existingmath.sltexpected values were updated accordingly.Opening this as a draft proposal because widening the return type is a semantics decision: it fixes the precision loss in #22581, but it changes observed output types/precision for existing mixed-width queries. Does widening to the widest float match the intended
logsemantics, or would you prefer preserving the value's type (and accepting the precision loss, or handling it differently)? Happy to adjust. If this is considered a breaking change to the public output type, I can add theapi changelabel.