Skip to content

feat(extensions): support deprecation info in extensions#846

Open
nielspardon wants to merge 3 commits into
substrait-io:mainfrom
nielspardon:feat/extension-deprecation-info
Open

feat(extensions): support deprecation info in extensions#846
nielspardon wants to merge 3 commits into
substrait-io:mainfrom
nielspardon:feat/extension-deprecation-info

Conversation

@nielspardon

@nielspardon nielspardon commented Jun 4, 2026

Copy link
Copy Markdown
Member

BREAKING CHANGE: changes the public API of the ScalarFunctionVariant.resolve() function by adding an additional parameter

Summary

Implements support for deprecation information in simple extensions, introduced in Substrait v0.86.0 (substrait-io/substrait#1014).

An optional deprecated object can now be parsed and exposed on:

  • types
  • scalar / aggregate / window functions (function-level)
  • individual function implementations / overloads (impl-level)

The deprecated object carries a required since (core semantic version string), an optional reason, and optional metadata. As with the existing metadata support, consumers are not required to validate these fields — the information is surfaced so tooling can produce deprecation warnings.

Type variations are not modeled in substrait-java's extension schema (ExtensionSignatures only covers types, scalar_functions, aggregate_functions, window_functions), so type-variation deprecation is out of scope. I opened #847 to track adding type-variation support to the extension model (which would then also cover their deprecation info).

Details

  • New SimpleExtension.DeprecationStatus immutable value type (since, reason, metadata), following the existing Option / VariadicBehavior pattern.
  • Added Optional<DeprecationStatus> deprecated() to Type, the ScalarFunction/AggregateFunction/WindowFunction parent classes, and the Function base class (per impl).
  • Function-level deprecation is propagated into each resolved variant. When both a function-level and an impl-level deprecation are present, the impl-level value takes precedence. Aggregate functions registered as window functions retain their deprecation automatically.

Testing

  • New DeprecationExtensionTest (7 tests) covering type, function-level, and impl-level deprecation, deprecation metadata, an overload-specific deprecation where a sibling overload is not deprecated, and carry-over to aggregate-as-window functions.
  • New deprecation_extensions.yaml test resource.

Closes #811

🤖 Generated with Claude Code

Substrait v0.86.0 added the ability to mark simple-extension entries as
deprecated (substrait-io/substrait#1014). An optional `deprecated` object
(`since`, optional `reason`, optional `metadata`) can appear on types,
scalar/aggregate/window functions, and individual function implementations.

Parse and expose this information so tooling can surface deprecation
warnings, mirroring the existing `metadata` support. Function-level
deprecation propagates to each resolved variant; an impl-level deprecation
takes precedence over the parent function's when both are present.

Type variations are not modeled in substrait-java's extension schema, so
type-variation deprecation is out of scope.

Closes substrait-io#811

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@vbarua vbarua left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments but overall seems reasonable.


/**
* Deprecation information for an extension entry (type, function, or function implementation).
* See <a href="https://github.com/substrait-io/substrait/pull/1014">substrait#1014</a>.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth linking to the PR where this was introduced. The Javadoc should stand on it's own, and it already does.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed that PR link from the comments. It suspect the AI wanted to follow an existing practice since this file contains another PR reference:

* <p>For more details see <a
* href="https://github.com/substrait-io/substrait-java/pull/55#issuecomment-1154484254">comments
* in this issue</a>

String name,
String description,
Optional<Map<String, Object>> metadata,
Optional<DeprecationStatus> deprecated) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a public API, we should note the breaking change in the description.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure why this specific method is public while AggregateFunctionVariant.resolve() and WindowFunctionVariant are package private.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to change the visibility of that method to be in line with the others methods and mark the change of visibility as a breaking change. What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you point it out, it is weird that this is public. Hiding this makes sense to me.

.args(args())
.options(options())
.metadata(metadata)
.deprecated(deprecated().isPresent() ? deprecated() : deprecated)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand what we're doing here, resolve is ScalarFunction#resolve which gets called with the function-level deprecation status. When we create the ScalarFunctionVariant, we first check if it has an impl level deprecation status, and if not just use the function-level one.

This is definitely a little lossy in terms of representation, but works reasonable well. Is the reason to inject the deprecation status like this to make it easy to check if ScalarFunctionVariant is deprecated directly without having to access the ScalarFunction holding it? If so, could we do something like inject the function-level deprecation status and store both on the ScalarFunctionImp? That way we still check the deprecation status easily with a method like isDeprecated that looks at the function-level and impl-level deprecation, but when we serialize ScalarFunctionImpl we can ignore the function-level deprecation to avoid setting it for every impl.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a little lossy in terms of representation, but works reasonable well. Is the reason to inject the deprecation status like this to make it easy to check if ScalarFunctionVariant is deprecated directly without having to access the ScalarFunction holding it?

I think the AI wanted to follow the existing pattern where we already copy the function name, description and metadata over from the function (ScalarFunction) to the impl (ScalarFunctionVariant). We could store a pointer to the ScalarFunction inside the ScalarFunctionVariant in general for accessing the function level metadata from the impl more easily.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior is already lossy since it overwrites the impl description with the function description.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also spent a bit more time trying to understand the existing code. What seems to happen is that we are deserializing the extensions YAMLs into SimpleExtension.ExtensionCollection which then contains most of the structure in the YAML and then the resolve() method copies some of the function level details to the impl level FunctionVariant classes for easier consumption like name, description, metadata, urn, etc.

It feels to me like the resolve step is a bit redundant and we could add pointers to the parent fields instead.

@vbarua vbarua Jun 10, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels to me like the resolve step is a bit redundant and we could add pointers to the parent fields instead.

That seems like a better design IMO. I vaguely recalled that resolve was a mechanism to inject context from higher up in the YAML into child objects. Turns out I made some modifications around this like 3 years ago #149

If we just had pointers to the parents, we could avoid it entirely.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I have a refactoring locally already. can push that up. maybe also include #848 while we refactor this area.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior is already lossy since it overwrites the impl description with the function description.

Given that it's already lossy, I would be happy to see this merged and then have the refactor as a follow-up. No need to block this change, and it will make the refactor PR smaller.

Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>

@vbarua vbarua left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circling back, if we hide the public resolve method I would be happy to see this shipped as is.

String name,
String description,
Optional<Map<String, Object>> metadata,
Optional<DeprecationStatus> deprecated) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you point it out, it is weird that this is public. Hiding this makes sense to me.

.args(args())
.options(options())
.metadata(metadata)
.deprecated(deprecated().isPresent() ? deprecated() : deprecated)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior is already lossy since it overwrites the impl description with the function description.

Given that it's already lossy, I would be happy to see this merged and then have the refactor as a follow-up. No need to block this change, and it will make the refactor PR smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support deprecation info in extensions

2 participants