proposal: PostgreSQL float semantics for NaN (discussion)#1617
proposal: PostgreSQL float semantics for NaN (discussion)#1617kevin-dp wants to merge 2 commits into
Conversation
…ering NaN (and invalid Dates, whose timestamp is NaN) had no consistent order: NaN === NaN is false in JS, so NaN compared unequal to everything and could not be sorted or stored in tree-based indexes deterministically. Following PostgreSQL, NaN is now treated as equal to itself and greater than every other non-null value: - ascComparator places NaN/invalid Dates last (greatest non-null), giving a valid total order so they can be sorted and indexed. - The WHERE evaluator's eq/gt/gte/lt/lte/in operators implement the same semantics: eq(x, NaN) matches NaN rows, range comparisons treat NaN as greatest, and IN matches a NaN member. null/undefined are unchanged (three-valued logic). Because the index ordering and the evaluator now agree on NaN, index-served and full-scan queries return identical results without any re-filtering. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
More templates
@tanstack/angular-db
@tanstack/browser-db-sqlite-persistence
@tanstack/capacitor-db-sqlite-persistence
@tanstack/cloudflare-durable-objects-db-sqlite-persistence
@tanstack/db
@tanstack/db-ivm
@tanstack/db-sqlite-persistence-core
@tanstack/electric-db-collection
@tanstack/electron-db-sqlite-persistence
@tanstack/expo-db-sqlite-persistence
@tanstack/node-db-sqlite-persistence
@tanstack/offline-transactions
@tanstack/powersync-db-collection
@tanstack/query-db-collection
@tanstack/react-db
@tanstack/react-native-db-sqlite-persistence
@tanstack/rxdb-db-collection
@tanstack/solid-db
@tanstack/svelte-db
@tanstack/tauri-db-sqlite-persistence
@tanstack/trailbase-db-collection
@tanstack/vue-db
commit: |
) Fold the NaN/invalid-Date semantics from the TanStack#1617 proposal into this PR so the index-optimization correctness work is built on the final, coherent contract instead of the interim JS-semantics handling. Previously this branch gave NaN/invalid Dates a stable sort position *for sorting only* while the WHERE evaluator still rejected them (NaN != NaN), relying on re-filtering to drop index-returned NaN rows. That left a JS/SQL hybrid where joins/groupBy/distinct (which match NaN = NaN via the hash index) disagreed with WHERE/ordering. Following PostgreSQL, NaN (and invalid Dates, whose timestamp is NaN) is now equal to itself and greater than every other non-null value: - comparison.ts: ascComparator orders NaN/invalid Dates as the greatest non-null value (was: alongside nulls); isUnorderable is exported. - evaluators.ts: eq/gt/gte/lt/lte/in implement the same via valuesEqual. - index-optimization.ts: because the index and evaluator now agree on NaN/invalid Dates, they are treated as exact (no re-filter) and invalid Dates are no longer range-divergent. This resolves the reviewer's invalid-Date eq/IN issue (indexed == full-scan) and simplifies the NaN-specific defensiveness down to the remaining nullish cases. null/undefined are unchanged: still three-valued logic (UNKNOWN). Tests: new nan-semantics.test.ts (numeric NaN + invalid Date, asserting indexed == full-scan), PG-semantics comparison.test.ts and evaluator NaN block; updated the NaN tests in collection-indexes.test.ts and deterministic-ordering.test.ts that encoded the old JS behavior. Docs and a minor changeset added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Superseded by #1582. The PostgreSQL
Closing in favor of #1582. Thanks for the discussion that shaped the direction. |
…1582) * test: add failing tests for index-optimized queries mixing indexed and non-indexed conditions These tests assert the expected results of currentStateAsChanges for AND/OR where clauses that combine conditions on indexed fields with conditions that cannot be served by an index. They currently fail. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test: add failing tests for range query boundary handling Adds expected-behaviour tests for range conditions: - compound ranges sharing a boundary value must apply the strictest bound regardless of argument order, including for date values - one-sided compound ranges must return the matching rows - strict comparisons (gt) on date fields must exclude the boundary row Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test: add failing test for compound range query with undefined bound A compound range condition where one bound is undefined (e.g. gt(score, undefined) AND lt(score, 90)) must match nothing, since a comparison against undefined is never true. The index-optimized path must agree with a full scan. This test currently fails. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: add failing tests for nullish values in indexed eq/in/range queries A comparison against null/undefined is never true, but BTree indexes store and return rows with nullish indexed values (they sort as the smallest key). These tests assert that the index-optimized snapshot matches a full predicate scan for: - eq against undefined - IN with an undefined member - a range comparison over a field that has rows with undefined values - an upper-bounded compound range over such a field They currently fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: add failing tests for locale string range and NaN index queries Two more cases where an index-optimized snapshot must match a full predicate scan: - a string range predicate (e.g. name > 'z') must return a row whose value satisfies the JS relational comparison ('ö' > 'z'), even though a locale-collated index orders that value differently - eq and IN against NaN must not match a NaN-valued row, since NaN is never equal to itself They currently fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: add failing tests for range predicates over non-orderable index domains Three more cases where an index-optimized range query must match a full predicate scan: - an array-valued field (the evaluator compares with standard relational operators, which differ from the index's recursive array ordering) - a field indexed with a custom comparator - a numeric field that also contains a NaN value They currently fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: add failing tests for ordering values that have no natural order NaN and invalid Dates have no natural order. They should still get a consistent, well-defined position (alongside nulls) so that: - the comparator produces a stable total order; - ordering a collection by such a field is deterministic; - a range query on a field that contains such a value can still be served by the index rather than falling back to a full scan. These tests currently fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: enforce all where conditions when index optimization is partial OR expressions now require every disjunct to be index-optimizable; otherwise the query falls back to a full scan, since rows matched only by a non-optimizable disjunct cannot be recovered from index lookups. AND expressions keep partial index optimization but the optimizer now reports whether the matching keys are exact. When they are a superset (some conjuncts could not use an index, or a compound range was combined with other conditions), currentStateAsChanges re-checks each candidate row against the full where expression. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix: apply strictest bound in compound range queries and fix related range edge cases Compound range conditions sharing a boundary value (gte(x,5) AND gt(x,5)) now keep the strict bound regardless of argument order. Bound values are compared with the same comparator the indexes use so dates and locale strings behave correctly. Two further issues surfaced by the regression tests: - One-sided compound ranges passed an explicit undefined bound to rangeQuery, which treats present-but-undefined as the undefined sentinel and returned an empty result. Bounds are now only passed when they exist. - BTreeIndex's exclusive lower bound check compared the normalized indexed value against the raw query value, so gt on date fields included the boundary row. It now compares against the normalized key. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test: add failing test for exclusive lower bound without a from bound rangeQuery with only an upper bound but fromInclusive: false must not drop the minimum key, as there is no lower bound to exclude against. This regression was introduced when the exclusive lower-bound check started comparing against the normalized fromKey (which defaults to minKey when no from bound is given). This test currently fails. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: re-filter compound range queries that use a null/undefined bound A comparison against null/undefined is never true, but in an index those values sort as the smallest key, so an index range query cannot represent such a bound. Compound range optimization now tracks selected bounds with explicit hasFromBound/hasToBound flags (separate from the bound values) and marks the result inexact when any bound value is null/undefined, so the caller re-filters against the full expression. The inexactness now also propagates through the AND combiner, which previously ignored the compound range's exactness. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: only exclude exclusive lower bound when a from bound is provided BTreeIndex.rangeQuery dropped the minimum key when called with fromInclusive: false but no from bound, because fromKey defaults to the minimum key and the exclusion check did not verify a lower bound was actually given. The exclusion is now guarded by hasFrom. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: re-filter index results that can include nullish-keyed rows BTree indexes store and return rows with a null/undefined indexed value (they sort as the smallest key), but a comparison against null/undefined is never true. The simple-comparison, IN, and compound-range optimizers now report such results as inexact so the caller re-checks candidates against the full expression: - eq/gt/gte: inexact when the query value is nullish (gt/gte with a non-null bound stay exact, since the bound excludes the bottom-sorted nullish rows) - lt/lte: conservatively inexact, as the open lower bound includes nullish-keyed rows - IN: inexact when any listed value is nullish - compound range: exact only when a non-null lower bound is present to exclude the nullish rows Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * fix: avoid locale string range index lookups and re-filter NaN results Two more index-optimization correctness issues: - A BTree index orders strings with localeCompare under the default 'locale' collation, but the WHERE evaluator compares strings with JS relational operators (code-point order). For range predicates these orders disagree (e.g. 'oe-umlaut' > 'z' is true in JS but sorts before 'z' under locale), so an index range lookup can omit matching rows - which re-filtering cannot recover. Locale-backed string range predicates are now left for a full scan (eq/IN use exact equality and are unaffected). - eq/IN against NaN returned isExact: true, but NaN is never equal to itself while the index still returns NaN-keyed rows (SameValueZero map equality). NaN is now treated like a nullish value for exactness, so such results are re-filtered against the full expression. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: only use indexes for range predicates when ordering is trustworthy Range optimization assumed the index orders values the same way the WHERE evaluator's relational operators do. That holds for numbers, booleans, bigints, lexical strings and valid Dates, but not for: - non-primitive operands (arrays, plain objects, Temporal, invalid Dates), which the evaluator compares via string coercion / identity while the index compares recursively; - indexes created with a custom comparator, whose order is opaque; - fields containing a NaN or invalid Date, which compare equal to every value and break the strict-weak-ordering range traversal relies on. In all three cases an index range lookup can omit genuine matches, which re-filtering cannot recover, so they now fall back to a full scan. The index exposes a supportsRangeOptimization capability (false for custom comparators or when an unorderable value is stored), and the optimizer additionally checks the operand domain. eq/IN are unaffected (exact equality, not ordering). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * fix: give NaN and invalid Dates a stable sort position The comparator returned 0 for NaN against any value (and NaN from invalid-Date subtraction), so NaN had no consistent order. That made ordering by a field containing NaN non-deterministic and, worse, corrupted the strict-weak-ordering that B-tree range traversal relies on, so a stored NaN could make a range query drop genuinely matching rows. ascComparator now places NaN and invalid Dates alongside nulls, giving a well-defined total order. With a valid order the index traversal is correct again, so range queries on a field containing such values no longer deopt to a full scan: NaN simply sorts to the nulls end, where the existing exactness logic excludes it from lower-bounded ranges and re-filters it out of open-bottom (lt/lte) ranges. The index capability therefore only needs to deopt for custom comparators, so the stored-value check is removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat: adopt PostgreSQL float semantics for NaN (supersedes #1617) Fold the NaN/invalid-Date semantics from the #1617 proposal into this PR so the index-optimization correctness work is built on the final, coherent contract instead of the interim JS-semantics handling. Previously this branch gave NaN/invalid Dates a stable sort position *for sorting only* while the WHERE evaluator still rejected them (NaN != NaN), relying on re-filtering to drop index-returned NaN rows. That left a JS/SQL hybrid where joins/groupBy/distinct (which match NaN = NaN via the hash index) disagreed with WHERE/ordering. Following PostgreSQL, NaN (and invalid Dates, whose timestamp is NaN) is now equal to itself and greater than every other non-null value: - comparison.ts: ascComparator orders NaN/invalid Dates as the greatest non-null value (was: alongside nulls); isUnorderable is exported. - evaluators.ts: eq/gt/gte/lt/lte/in implement the same via valuesEqual. - index-optimization.ts: because the index and evaluator now agree on NaN/invalid Dates, they are treated as exact (no re-filter) and invalid Dates are no longer range-divergent. This resolves the reviewer's invalid-Date eq/IN issue (indexed == full-scan) and simplifies the NaN-specific defensiveness down to the remaining nullish cases. null/undefined are unchanged: still three-valued logic (UNKNOWN). Tests: new nan-semantics.test.ts (numeric NaN + invalid Date, asserting indexed == full-scan), PG-semantics comparison.test.ts and evaluator NaN block; updated the NaN tests in collection-indexes.test.ts and deterministic-ordering.test.ts that encoded the old JS behavior. Docs and a minor changeset added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: mark NaN-semantics changeset as patch (no minor before 1.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: fold nan-semantics tests into existing well-suited test files Move the unique NaN/invalid-Date coverage (lt on NaN, invalid-Date eq/IN/range index parity) into collection-indexes.test.ts alongside the existing NaN index tests, and drop the standalone nan-semantics.test.ts whose other cases were already covered by collection-indexes, deterministic-ordering and evaluators tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Summary
This is a proposal / discussion PR: it makes the query engine follow PostgreSQL float semantics for
NaNinstead of raw JavaScript semantics. I'd like maintainer input before considering it for merge.The problem
NaN === NaNisfalsein JavaScript, and the WHERE evaluator inherited that:eq(row.value, NaN)matches nothing, even for rows whose value isNaN.NaNhas no consistent order (it's neither<,>, nor=anything), so it can't be sorted or stored in a tree-based index deterministically. A single storedNaNcan corrupt range traversal and make ordering depend on insertion order.Critically, this is also internally inconsistent today: joins,
groupBy, anddistinctalready treatNaN = NaN(they match/group keys via a hash/Mapindex, and JSMaptreatsNaNas equal toNaN), whileWHERE eq(x, NaN)returns nothing. The same value behaves differently depending on where it appears.The change
Following PostgreSQL — which deliberately overrides IEEE-754 precisely so floats can be sorted and indexed —
NaN(and invalidDatevalues, whose timestamp isNaN) is now treated as:Concretely:
ascComparatorplacesNaN/invalid Dates last (greatest non-null), giving a valid total order.eq/gt/gte/lt/lte/inimplement the same:eq(x, NaN)matchesNaNrows, range ops treatNaNas greatest,INmatches aNaNmember.null/undefinedare unchanged — still three-valued logic (a comparison withnullisUNKNOWN), still ordered byNULLS FIRST/NULLS LAST.Because the index ordering and the evaluator now agree on
NaN, index-served and full-scan queries return identical results with no re-filtering — thenan-semantics.test.tssuite asserts this equivalence foreq/gt/lt/in.End-to-end consistency (verified)
Every place
NaNcan be compared or grouped now agrees on Postgres semantics. I verified each:eq/gt/gte/lt/lte/in)NaNas a value greater than all numbers (this PR)currentStateAsChangesand live-queryorderBy)NaNsorts lastNaNkey)NaNrows join — already true onmain, via the hash/MapindexgroupByNaNrows form a single group — already true onmaindistinctNaNdedups to one value — already true onmainJoins/
groupBy/distinctalready usedNaN = NaN(SameValueZero in the incremental layer). Onmain, WHERE/ordering disagreed with them; this PR makes WHERE/ordering match, so it removes the existing inconsistency rather than introducing a gap.Why this matters / context
This came out of a review thread on #1582 (index-optimization correctness). That PR keeps JavaScript
NaNsemantics (NaN != NaN) and handles the index/ordering problem by givingNaNa stable position for sorting only while the evaluator still rejects it — correct, but a JS/SQL hybrid that leaves the join/group-vs-filter mismatch in place. This PR is the alternative: pick one coherent, SQL-faithful model end-to-end. If the team prefers this direction, it would supersede theNaN-specific bits of #1582.The one open question
This is fundamentally a semantics decision: do we want Postgres float semantics (
NaN = NaN,NaNgreatest) as the documented contract? It's the most SQL-faithful and internally consistent option — and it's what the join/group/distinct machinery already does — but it is a user-facing change for anyone relying on JSNaNsemantics, so it deserves an explicit, documented call.null/undefinedare intentionally left as three-valued logic (absence /UNKNOWN), distinct fromNaN(a concrete greatest value), matching Postgres.Testing & docs
tests/comparison.test.ts(comparator:NaNsorts last, invalid Dates, null still first).tests/nan-semantics.test.ts(collection-leveleq/gt/lt/in/orderBy, asserting indexed == full-scan).tests/query/compiler/evaluators.test.tswith aNaNblock.@tanstack/dbsuite green (2414 tests). No existing test assertedNaNbehavior (the property tests explicitly excluded it because the old comparator broke on it).🤖 Generated with Claude Code