functions: save FieldType as value instead of ptr in json function#10846
Conversation
Signed-off-by: yongman <yming0221@gmail.com>
|
@yongman I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR refactors TiDB field type storage in JSON casting functions from raw pointers to ChangesJSON FieldType Optional Refactoring and Regression Test
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
dbms/src/Functions/FunctionsJson.h (1)
439-439: 💤 Low valueRefactoring from pointer to value changes ownership semantics.
The change from
const tipb::FieldType*tostd::optional<tipb::FieldType>is semantically significant: the function now owns a copy of the FieldType rather than holding a reference to external data. This eliminates potential lifetime issues (dangling pointers), which likely addresses the consistency bug mentioned in issue#10845.The setter copies
tipb::FieldTypeon each call. Iftipb::FieldType(a protobuf message) is large, consider adding a move-enabled overload:void setOutputTiDBFieldType(tipb::FieldType tidb_tp_) { tidb_tp = std::move(tidb_tp_); }However, the current implementation is correct, and the copy overhead may be acceptable.
Also applies to: 467-467, 530-530
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Functions/FunctionsJson.h` at line 439, The setter currently copies a potentially large protobuf (setOutputTiDBFieldType) which can be expensive; add a move-enabled overload that takes tipb::FieldType by value (or an rvalue ref) and moves it into the std::optional member (tidb_tp) to avoid the extra copy, and apply the same change to the other setters flagged in this file (the other setOutputTiDBFieldType occurrences referenced in the comment).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@dbms/src/Functions/FunctionsJson.h`:
- Line 439: The setter currently copies a potentially large protobuf
(setOutputTiDBFieldType) which can be expensive; add a move-enabled overload
that takes tipb::FieldType by value (or an rvalue ref) and moves it into the
std::optional member (tidb_tp) to avoid the extra copy, and apply the same
change to the other setters flagged in this file (the other
setOutputTiDBFieldType occurrences referenced in the comment).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: aa141322-ab1a-4623-ab1a-810339dd1046
📒 Files selected for processing (1)
dbms/src/Functions/FunctionsJson.h
JaySon-Huang
left a comment
There was a problem hiding this comment.
Verified the fixed in the tiflash-cse columnar branch.
LGTM
|
/cc @windtalker @gengliqi |
|
I don't understand why "disaggregated columnar path builds temporary FilterConditions" |
@windtalker The related logic refer to tiflash/dbms/src/Storages/StorageDisaggregatedRemote.cpp Lines 126 to 160 in 6f17860 |
can you give some explainations about why disaggregated columnar path is so special? |
Signed-off-by: JaySon-Huang <tshent@qq.com>
|
The disaggregated columnar path is special because the columnar reader and the classic For a TiDB table scan, /// pushed_down_filter_conditions is the filter conditions that are
/// pushed down to table scan by late materialization.
/// They will be executed on Storage layer.In the classic The disaggregated columnar reader is different. It receives the table scan protobuf with // Copy pushed down filters to filter_conditions to make filterConditions works properly.
// Proxy columnar reader use pushed down filters to reduce packs load from disk and has no
// guarantee to filter all useless data, so we rely on the filterConditions to filter data.So, for the columnar path, That is why Code references
|
Signed-off-by: JaySon-Huang <tshent@qq.com>
|
/test pull-unit-test |
1 similar comment
|
/test pull-unit-test |
|
@windtalker PTAL again |
|
Besides the json function, there are some other functions that still hold the reference of |
|
@windtalker Thanks for the reminder. I checked the other functions under I updated both of them to own a The remaining |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JaySon-Huang, windtalker The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
…ingcap#10846) close pingcap#10845\n\nfunctions: save FieldType as value instead of ptr in json function Store TiDB FieldType metadata by value in JSON cast functions instead of keeping raw pointers to caller-owned FieldType objects. Use std::optional<tipb::FieldType> for optional FieldType metadata and update the missing-metadata checks accordingly in: - FunctionCastJsonAsString - FunctionCastIntAsJson - FunctionCastStringAsJson - FunctionCastTimeAsJson This avoids dangling FieldType references when JSON cast functions are created from temporary filter conditions, and keeps TEXT-to-JSON cast behavior stable for pushed-down JSON_EXTRACT filters.\n\nSigned-off-by: JaySon-Huang <tshent@qq.com>\n\nCo-authored-by: JaySon-Huang <tshent@qq.com>
What problem does this PR solve?
Issue Number: close #10845
Problem Summary:
When TiFlash nextgen evaluates
JSON_EXTRACTon aTEXTcolumn withIS NULL/IS NOT NULLfilters, the result can be inconsistent withJSONcolumns.For example,
JSON_EXTRACT(action_params, '$.popup_id') IS NULLmay return rows whose extracted value is actually non-null, whileIS NOT NULLreturns no rows.The root cause is that the disaggregated columnar path builds temporary
FilterConditions, and JSON cast functions keep raw pointers totipb::FieldType. After the temporary object is destroyed, those pointers can become dangling, soFunctionCastStringAsJsonmay read invalid FieldType metadata.What is changed and how it works?
Check List
Tests
Manual test:
Use the SQL in #10845 to create
event_log1withaction_params TEXTandevent_log2withaction_params JSON, then run with:Verify that both TEXT and JSON columns return consistent results:
Side effects
Documentation
Release note
Summary by CodeRabbit
Release Notes
Bug Fixes
#10845: Corrected inconsistencies in JSON extraction and null filtering operations when using TiFlash. The fix ensures consistent behavior when working with various data types and null values. Added comprehensive regression tests to validate the corrections across multiple scenarios.Refactor