Skip to content

[FLINK-39821][table] Make REGEXP_REPLACE return type nullable#28293

Merged
fhueske merged 1 commit into
apache:masterfrom
raminqaf:FLINK-39821
Jun 3, 2026
Merged

[FLINK-39821][table] Make REGEXP_REPLACE return type nullable#28293
fhueske merged 1 commit into
apache:masterfrom
raminqaf:FLINK-39821

Conversation

@raminqaf

@raminqaf raminqaf commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Issue was mentioned post-merge in this PR: #28189 (comment)

What is the purpose of the change

REGEXP_REPLACE declared its output via nullableIfArgs, so the planner inferred a NOT NULL result when all three arguments were NOT NULL. But the runtime returns null for a
non-literal regex that fails to compile: a column reference or a CONCAT(...) result is only validated at runtime, not at planning time. That let a null flow through a column the
planner believed was non-null, which is unsound.

This is a sub-task of the FLINK-39648 umbrella.

Brief change log

  • Change BuiltInFunctionDefinitions.REGEXP_REPLACE output type strategy from nullableIfArgs(explicit(STRING())) to explicit(DataTypes.STRING().nullable()), matching
    REGEXP_EXTRACT, which is always-nullable for the same reason. (REGEXP keeps nullableIfArgs because it returns false, not null, on an invalid pattern.)

Verifying this change

This change added a test and can be verified as follows:

  • New RegexpFunctionsITCase case with NOT NULL arguments and a non-literal invalid regex, asserting the output type stays nullable and the value is null.
  • Existing RegexpFunctionsITCase, ScalarFunctionsTest, SqlExpressionTest, MultiJoinTest (plan digests unchanged), and ExpressionSerializationTest regress the function.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: Opus 4.8

REGEXP_REPLACE used nullableIfArgs, so the planner inferred a NOT NULL output when all three arguments were NOT NULL. The runtime returns null for a non-literal regex that fails to compile, since a column reference or CONCAT result is only validated at runtime, not at planning time. That let a null value flow through a column the planner believed was non-null.

Switches the output type strategy to explicit(STRING().nullable()), matching REGEXP_EXTRACT which is nullable for the same reason. REGEXP keeps nullableIfArgs because its runtime returns false, not null, on an invalid pattern.

Adds a RegexpFunctionsITCase case with NOT NULL arguments and a non-literal invalid regex, asserting the output type stays nullable and the value is null.
@flinkbot

flinkbot commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@dylanhz dylanhz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix, LGTM!

@fhueske fhueske left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

I'll merge this 🙌

@fhueske fhueske merged commit cebcd70 into apache:master Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants