Skip to content

proposal: PostgreSQL float semantics for NaN (discussion)#1617

Closed
kevin-dp wants to merge 2 commits into
TanStack:mainfrom
kevin-dp:proposal/nan-postgres-semantics
Closed

proposal: PostgreSQL float semantics for NaN (discussion)#1617
kevin-dp wants to merge 2 commits into
TanStack:mainfrom
kevin-dp:proposal/nan-postgres-semantics

Conversation

@kevin-dp

@kevin-dp kevin-dp commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

This is a proposal / discussion PR: it makes the query engine follow PostgreSQL float semantics for NaN instead of raw JavaScript semantics. I'd like maintainer input before considering it for merge.

The problem

NaN === NaN is false in JavaScript, and the WHERE evaluator inherited that:

  • eq(row.value, NaN) matches nothing, even for rows whose value is NaN.
  • NaN has no consistent order (it's neither <, >, nor = anything), so it can't be sorted or stored in a tree-based index deterministically. A single stored NaN can corrupt range traversal and make ordering depend on insertion order.

Critically, this is also internally inconsistent today: joins, groupBy, and distinct already treat NaN = NaN (they match/group keys via a hash/Map index, and JS Map treats NaN as equal to NaN), while WHERE eq(x, NaN) returns nothing. The same value behaves differently depending on where it appears.

The change

Following PostgreSQL — which deliberately overrides IEEE-754 precisely so floats can be sorted and indexedNaN (and invalid Date values, whose timestamp is NaN) is now treated as:

  • equal to itself, and
  • greater than every other non-null value.

Concretely:

  • ascComparator places NaN/invalid Dates last (greatest non-null), giving a valid total order.
  • The evaluator's eq / gt / gte / lt / lte / in implement the same: eq(x, NaN) matches NaN rows, range ops treat NaN as greatest, IN matches a NaN member.
  • null / undefined are unchanged — still three-valued logic (a comparison with null is UNKNOWN), still ordered by NULLS FIRST / NULLS LAST.

Because the index ordering and the evaluator now agree on NaN, index-served and full-scan queries return identical results with no re-filtering — the nan-semantics.test.ts suite asserts this equivalence for eq/gt/lt/in.

End-to-end consistency (verified)

Every place NaN can be compared or grouped now agrees on Postgres semantics. I verified each:

Site Behavior with this PR
WHERE (eq/gt/gte/lt/lte/in) matches NaN as a value greater than all numbers (this PR)
Ordering (currentStateAsChanges and live-query orderBy) NaN sorts last
Joins (equi-join on a NaN key) NaN rows join — already true on main, via the hash/Map index
groupBy NaN rows form a single group — already true on main
distinct NaN dedups to one value — already true on main

Joins/groupBy/distinct already used NaN = NaN (SameValueZero in the incremental layer). On main, WHERE/ordering disagreed with them; this PR makes WHERE/ordering match, so it removes the existing inconsistency rather than introducing a gap.

Why this matters / context

This came out of a review thread on #1582 (index-optimization correctness). That PR keeps JavaScript NaN semantics (NaN != NaN) and handles the index/ordering problem by giving NaN a stable position for sorting only while the evaluator still rejects it — correct, but a JS/SQL hybrid that leaves the join/group-vs-filter mismatch in place. This PR is the alternative: pick one coherent, SQL-faithful model end-to-end. If the team prefers this direction, it would supersede the NaN-specific bits of #1582.

The one open question

This is fundamentally a semantics decision: do we want Postgres float semantics (NaN = NaN, NaN greatest) as the documented contract? It's the most SQL-faithful and internally consistent option — and it's what the join/group/distinct machinery already does — but it is a user-facing change for anyone relying on JS NaN semantics, so it deserves an explicit, documented call. null/undefined are intentionally left as three-valued logic (absence / UNKNOWN), distinct from NaN (a concrete greatest value), matching Postgres.

Testing & docs

  • New tests/comparison.test.ts (comparator: NaN sorts last, invalid Dates, null still first).
  • New tests/nan-semantics.test.ts (collection-level eq/gt/lt/in/orderBy, asserting indexed == full-scan).
  • Extended tests/query/compiler/evaluators.test.ts with a NaN block.
  • Documented under "Comparison semantics" in the live-queries guide; changeset included (minor).
  • Full @tanstack/db suite green (2414 tests). No existing test asserted NaN behavior (the property tests explicitly excluded it because the old comparator broke on it).

🤖 Generated with Claude Code

…ering

NaN (and invalid Dates, whose timestamp is NaN) had no consistent order:
NaN === NaN is false in JS, so NaN compared unequal to everything and
could not be sorted or stored in tree-based indexes deterministically.

Following PostgreSQL, NaN is now treated as equal to itself and greater
than every other non-null value:

- ascComparator places NaN/invalid Dates last (greatest non-null),
  giving a valid total order so they can be sorted and indexed.
- The WHERE evaluator's eq/gt/gte/lt/lte/in operators implement the same
  semantics: eq(x, NaN) matches NaN rows, range comparisons treat NaN as
  greatest, and IN matches a NaN member.

null/undefined are unchanged (three-valued logic). Because the index
ordering and the evaluator now agree on NaN, index-served and full-scan
queries return identical results without any re-filtering.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kevin-dp kevin-dp marked this pull request as draft June 25, 2026 10:10
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ee25a124-9cb5-4138-9293-5a461df964cf

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@pkg-pr-new

pkg-pr-new Bot commented Jun 25, 2026

Copy link
Copy Markdown
More templates

@tanstack/angular-db

npm i https://pkg.pr.new/@tanstack/angular-db@1617

@tanstack/browser-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/browser-db-sqlite-persistence@1617

@tanstack/capacitor-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/capacitor-db-sqlite-persistence@1617

@tanstack/cloudflare-durable-objects-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/cloudflare-durable-objects-db-sqlite-persistence@1617

@tanstack/db

npm i https://pkg.pr.new/@tanstack/db@1617

@tanstack/db-ivm

npm i https://pkg.pr.new/@tanstack/db-ivm@1617

@tanstack/db-sqlite-persistence-core

npm i https://pkg.pr.new/@tanstack/db-sqlite-persistence-core@1617

@tanstack/electric-db-collection

npm i https://pkg.pr.new/@tanstack/electric-db-collection@1617

@tanstack/electron-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/electron-db-sqlite-persistence@1617

@tanstack/expo-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/expo-db-sqlite-persistence@1617

@tanstack/node-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/node-db-sqlite-persistence@1617

@tanstack/offline-transactions

npm i https://pkg.pr.new/@tanstack/offline-transactions@1617

@tanstack/powersync-db-collection

npm i https://pkg.pr.new/@tanstack/powersync-db-collection@1617

@tanstack/query-db-collection

npm i https://pkg.pr.new/@tanstack/query-db-collection@1617

@tanstack/react-db

npm i https://pkg.pr.new/@tanstack/react-db@1617

@tanstack/react-native-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/react-native-db-sqlite-persistence@1617

@tanstack/rxdb-db-collection

npm i https://pkg.pr.new/@tanstack/rxdb-db-collection@1617

@tanstack/solid-db

npm i https://pkg.pr.new/@tanstack/solid-db@1617

@tanstack/svelte-db

npm i https://pkg.pr.new/@tanstack/svelte-db@1617

@tanstack/tauri-db-sqlite-persistence

npm i https://pkg.pr.new/@tanstack/tauri-db-sqlite-persistence@1617

@tanstack/trailbase-db-collection

npm i https://pkg.pr.new/@tanstack/trailbase-db-collection@1617

@tanstack/vue-db

npm i https://pkg.pr.new/@tanstack/vue-db@1617

commit: b024d51

kevin-dp added a commit to kevin-dp/db that referenced this pull request Jun 26, 2026
)

Fold the NaN/invalid-Date semantics from the TanStack#1617 proposal into this PR
so the index-optimization correctness work is built on the final, coherent
contract instead of the interim JS-semantics handling.

Previously this branch gave NaN/invalid Dates a stable sort position *for
sorting only* while the WHERE evaluator still rejected them (NaN != NaN),
relying on re-filtering to drop index-returned NaN rows. That left a
JS/SQL hybrid where joins/groupBy/distinct (which match NaN = NaN via the
hash index) disagreed with WHERE/ordering.

Following PostgreSQL, NaN (and invalid Dates, whose timestamp is NaN) is
now equal to itself and greater than every other non-null value:

- comparison.ts: ascComparator orders NaN/invalid Dates as the greatest
  non-null value (was: alongside nulls); isUnorderable is exported.
- evaluators.ts: eq/gt/gte/lt/lte/in implement the same via valuesEqual.
- index-optimization.ts: because the index and evaluator now agree on
  NaN/invalid Dates, they are treated as exact (no re-filter) and invalid
  Dates are no longer range-divergent. This resolves the reviewer's
  invalid-Date eq/IN issue (indexed == full-scan) and simplifies the
  NaN-specific defensiveness down to the remaining nullish cases.

null/undefined are unchanged: still three-valued logic (UNKNOWN).

Tests: new nan-semantics.test.ts (numeric NaN + invalid Date, asserting
indexed == full-scan), PG-semantics comparison.test.ts and evaluator NaN
block; updated the NaN tests in collection-indexes.test.ts and
deterministic-ordering.test.ts that encoded the old JS behavior. Docs and
a minor changeset added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kevin-dp

Copy link
Copy Markdown
Contributor Author

Superseded by #1582.

The PostgreSQL NaN semantics proposed here have been folded directly into #1582, so the index-optimization correctness work is built on the final, coherent contract rather than the interim JS-semantics handling. The integrated change carries the full set:

  • comparison.tsascComparator orders NaN/invalid Dates as the greatest non-null value; isUnorderable exported.
  • evaluators.tseq/gt/gte/lt/lte/in follow PostgreSQL float semantics.
  • index-optimization.ts — the integration this proposal didn't cover: because the index and evaluator now agree on NaN/invalid Dates, the optimizer treats them as exact (no re-filter) and stops treating invalid Dates as range-divergent. This also resolves the invalid-Date eq/IN finding from the fix: enforce all where conditions when index optimization is partial #1582 review thread.
  • Tests (nan-semantics.test.ts, PG-semantics comparison.test.ts, evaluator NaN block), the live-queries doc note, and a minor changeset all came across; the NaN tests that encoded the old JS behavior were updated.

Closing in favor of #1582. Thanks for the discussion that shaped the direction.

@kevin-dp kevin-dp closed this Jun 26, 2026
kevin-dp added a commit that referenced this pull request Jun 26, 2026
…1582)

* test: add failing tests for index-optimized queries mixing indexed and non-indexed conditions

These tests assert the expected results of currentStateAsChanges for
AND/OR where clauses that combine conditions on indexed fields with
conditions that cannot be served by an index. They currently fail.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: add failing tests for range query boundary handling

Adds expected-behaviour tests for range conditions:
- compound ranges sharing a boundary value must apply the strictest
  bound regardless of argument order, including for date values
- one-sided compound ranges must return the matching rows
- strict comparisons (gt) on date fields must exclude the boundary row

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: add failing test for compound range query with undefined bound

A compound range condition where one bound is undefined (e.g.
gt(score, undefined) AND lt(score, 90)) must match nothing, since a
comparison against undefined is never true. The index-optimized path
must agree with a full scan. This test currently fails.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: add failing tests for nullish values in indexed eq/in/range queries

A comparison against null/undefined is never true, but BTree indexes
store and return rows with nullish indexed values (they sort as the
smallest key). These tests assert that the index-optimized snapshot
matches a full predicate scan for:
- eq against undefined
- IN with an undefined member
- a range comparison over a field that has rows with undefined values
- an upper-bounded compound range over such a field

They currently fail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: add failing tests for locale string range and NaN index queries

Two more cases where an index-optimized snapshot must match a full
predicate scan:
- a string range predicate (e.g. name > 'z') must return a row whose
  value satisfies the JS relational comparison ('ö' > 'z'), even though
  a locale-collated index orders that value differently
- eq and IN against NaN must not match a NaN-valued row, since NaN is
  never equal to itself

They currently fail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: add failing tests for range predicates over non-orderable index domains

Three more cases where an index-optimized range query must match a
full predicate scan:
- an array-valued field (the evaluator compares with standard relational
  operators, which differ from the index's recursive array ordering)
- a field indexed with a custom comparator
- a numeric field that also contains a NaN value

They currently fail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: add failing tests for ordering values that have no natural order

NaN and invalid Dates have no natural order. They should still get a
consistent, well-defined position (alongside nulls) so that:
- the comparator produces a stable total order;
- ordering a collection by such a field is deterministic;
- a range query on a field that contains such a value can still be
  served by the index rather than falling back to a full scan.

These tests currently fail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: enforce all where conditions when index optimization is partial

OR expressions now require every disjunct to be index-optimizable;
otherwise the query falls back to a full scan, since rows matched only
by a non-optimizable disjunct cannot be recovered from index lookups.

AND expressions keep partial index optimization but the optimizer now
reports whether the matching keys are exact. When they are a superset
(some conjuncts could not use an index, or a compound range was
combined with other conditions), currentStateAsChanges re-checks each
candidate row against the full where expression.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: apply strictest bound in compound range queries and fix related range edge cases

Compound range conditions sharing a boundary value (gte(x,5) AND
gt(x,5)) now keep the strict bound regardless of argument order. Bound
values are compared with the same comparator the indexes use so dates
and locale strings behave correctly.

Two further issues surfaced by the regression tests:

- One-sided compound ranges passed an explicit undefined bound to
  rangeQuery, which treats present-but-undefined as the undefined
  sentinel and returned an empty result. Bounds are now only passed
  when they exist.

- BTreeIndex's exclusive lower bound check compared the normalized
  indexed value against the raw query value, so gt on date fields
  included the boundary row. It now compares against the normalized
  key.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: add failing test for exclusive lower bound without a from bound

rangeQuery with only an upper bound but fromInclusive: false must not
drop the minimum key, as there is no lower bound to exclude against.
This regression was introduced when the exclusive lower-bound check
started comparing against the normalized fromKey (which defaults to
minKey when no from bound is given). This test currently fails.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: re-filter compound range queries that use a null/undefined bound

A comparison against null/undefined is never true, but in an index
those values sort as the smallest key, so an index range query cannot
represent such a bound. Compound range optimization now tracks
selected bounds with explicit hasFromBound/hasToBound flags (separate
from the bound values) and marks the result inexact when any bound
value is null/undefined, so the caller re-filters against the full
expression. The inexactness now also propagates through the AND
combiner, which previously ignored the compound range's exactness.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: only exclude exclusive lower bound when a from bound is provided

BTreeIndex.rangeQuery dropped the minimum key when called with
fromInclusive: false but no from bound, because fromKey defaults to
the minimum key and the exclusion check did not verify a lower bound
was actually given. The exclusion is now guarded by hasFrom.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: re-filter index results that can include nullish-keyed rows

BTree indexes store and return rows with a null/undefined indexed
value (they sort as the smallest key), but a comparison against
null/undefined is never true. The simple-comparison, IN, and
compound-range optimizers now report such results as inexact so the
caller re-checks candidates against the full expression:

- eq/gt/gte: inexact when the query value is nullish (gt/gte with a
  non-null bound stay exact, since the bound excludes the bottom-sorted
  nullish rows)
- lt/lte: conservatively inexact, as the open lower bound includes
  nullish-keyed rows
- IN: inexact when any listed value is nullish
- compound range: exact only when a non-null lower bound is present to
  exclude the nullish rows

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci: apply automated fixes

* fix: avoid locale string range index lookups and re-filter NaN results

Two more index-optimization correctness issues:

- A BTree index orders strings with localeCompare under the default
  'locale' collation, but the WHERE evaluator compares strings with JS
  relational operators (code-point order). For range predicates these
  orders disagree (e.g. 'oe-umlaut' > 'z' is true in JS but sorts before
  'z' under locale), so an index range lookup can omit matching rows -
  which re-filtering cannot recover. Locale-backed string range
  predicates are now left for a full scan (eq/IN use exact equality and
  are unaffected).

- eq/IN against NaN returned isExact: true, but NaN is never equal to
  itself while the index still returns NaN-keyed rows (SameValueZero map
  equality). NaN is now treated like a nullish value for exactness, so
  such results are re-filtered against the full expression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: only use indexes for range predicates when ordering is trustworthy

Range optimization assumed the index orders values the same way the
WHERE evaluator's relational operators do. That holds for numbers,
booleans, bigints, lexical strings and valid Dates, but not for:

- non-primitive operands (arrays, plain objects, Temporal, invalid
  Dates), which the evaluator compares via string coercion / identity
  while the index compares recursively;
- indexes created with a custom comparator, whose order is opaque;
- fields containing a NaN or invalid Date, which compare equal to every
  value and break the strict-weak-ordering range traversal relies on.

In all three cases an index range lookup can omit genuine matches, which
re-filtering cannot recover, so they now fall back to a full scan. The
index exposes a supportsRangeOptimization capability (false for custom
comparators or when an unorderable value is stored), and the optimizer
additionally checks the operand domain. eq/IN are unaffected (exact
equality, not ordering).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci: apply automated fixes

* fix: give NaN and invalid Dates a stable sort position

The comparator returned 0 for NaN against any value (and NaN from
invalid-Date subtraction), so NaN had no consistent order. That made
ordering by a field containing NaN non-deterministic and, worse,
corrupted the strict-weak-ordering that B-tree range traversal relies
on, so a stored NaN could make a range query drop genuinely matching
rows.

ascComparator now places NaN and invalid Dates alongside nulls, giving
a well-defined total order. With a valid order the index traversal is
correct again, so range queries on a field containing such values no
longer deopt to a full scan: NaN simply sorts to the nulls end, where
the existing exactness logic excludes it from lower-bounded ranges and
re-filters it out of open-bottom (lt/lte) ranges. The index capability
therefore only needs to deopt for custom comparators, so the
stored-value check is removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: adopt PostgreSQL float semantics for NaN (supersedes #1617)

Fold the NaN/invalid-Date semantics from the #1617 proposal into this PR
so the index-optimization correctness work is built on the final, coherent
contract instead of the interim JS-semantics handling.

Previously this branch gave NaN/invalid Dates a stable sort position *for
sorting only* while the WHERE evaluator still rejected them (NaN != NaN),
relying on re-filtering to drop index-returned NaN rows. That left a
JS/SQL hybrid where joins/groupBy/distinct (which match NaN = NaN via the
hash index) disagreed with WHERE/ordering.

Following PostgreSQL, NaN (and invalid Dates, whose timestamp is NaN) is
now equal to itself and greater than every other non-null value:

- comparison.ts: ascComparator orders NaN/invalid Dates as the greatest
  non-null value (was: alongside nulls); isUnorderable is exported.
- evaluators.ts: eq/gt/gte/lt/lte/in implement the same via valuesEqual.
- index-optimization.ts: because the index and evaluator now agree on
  NaN/invalid Dates, they are treated as exact (no re-filter) and invalid
  Dates are no longer range-divergent. This resolves the reviewer's
  invalid-Date eq/IN issue (indexed == full-scan) and simplifies the
  NaN-specific defensiveness down to the remaining nullish cases.

null/undefined are unchanged: still three-valued logic (UNKNOWN).

Tests: new nan-semantics.test.ts (numeric NaN + invalid Date, asserting
indexed == full-scan), PG-semantics comparison.test.ts and evaluator NaN
block; updated the NaN tests in collection-indexes.test.ts and
deterministic-ordering.test.ts that encoded the old JS behavior. Docs and
a minor changeset added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: mark NaN-semantics changeset as patch (no minor before 1.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: fold nan-semantics tests into existing well-suited test files

Move the unique NaN/invalid-Date coverage (lt on NaN, invalid-Date
eq/IN/range index parity) into collection-indexes.test.ts alongside the
existing NaN index tests, and drop the standalone nan-semantics.test.ts
whose other cases were already covered by collection-indexes,
deterministic-ordering and evaluators tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant