Speed up uniqueItems validation with structural hashing by oscmarb · Pull Request #1482 · python-jsonschema/jsonschema

oscmarb · 2026-05-11T08:33:59Z

Summary

Rewrite uniq to deduplicate via a set of structural keys compatible with equality checks, instead of using sorted(...) + adjacent comparison.
The previous strategy degraded to O(n²) brute force when ordering could not be performed, meaning most real uniqueItems validations hit the slow path.
Unhashable elements (and NaN) still fall back to brute-force equality comparison, preserving correctness for edge cases.
Adds direct unit tests for uniq covering the bool/int distinction, structural sequence/mapping equality, NaN, and unhashable elements.

Performance

In my case, I was validating a 100 MB JSON file with a custom schema where almost all entries were checked for uniqueness via O(n²) brute force, as values were dicts and could not be sorted. By fixing this to run in linear time, the validation is now >17x faster.

JSON size	Before	After
20 MB	57 s	9 s
100 MB	530 s	30 s

Replace uniq's sort-then-compare strategy (which fell back to O(n^2) brute force) with an O(n) pass that builds an `equal`-compatible hashable key per element and dedupes via a set. Unhashable elements still fall back to brute force comparison.

for more information, see https://pre-commit.ci

Julian · 2026-05-12T02:53:44Z

@@ -1,7 +1,14 @@
 from collections.abc import Mapping, MutableMapping, Sequence
+from operator import ne


I don't see a reason to use ne here over just foo != bar.

And can you also revert the import change for re, it just adds noise to the diff.

I was trying to avoid the noqa as the linter was failing when comparing the same object. I changed it anyway and reverted the re-import too.

Julian · 2026-05-12T02:57:03Z

+        self.assertTrue(uniq([Unhashable(1), Unhashable(2)]))
+
+    def test_nan_is_not_uniquely_hashable(self):
+        self.assertFalse(uniq([nan, nan]))


This seems slightly misleading, the test is using the same identical nan instance. Probably worth comparing 2 different nans as well.

Renamed to test_nan_falls_back and added the distinct-instances case. Did the same for the sequence/mapping variants.

Had to change float("nan") to -nan. On Python 3.11 (both CPython and PyPy) float("nan") returns the math.nan singleton, so nan is float("nan") is True and equal short-circuits on identity. -nan was the only way I found to reliably get a distinct NaN instance across all supported versions.

Hah fun, yeah that seems ok.

Julian · 2026-05-12T02:58:55Z

Thanks, seems reasonable overall, left a few minor comments.

Julian · 2026-05-12T11:57:51Z

Thanks! Nice work.

oscmarb and others added 2 commits May 11, 2026 10:27

[pre-commit.ci] auto fixes from pre-commit.com hooks

68b0565

for more information, see https://pre-commit.ci

pre-commit-ci Bot temporarily deployed to PyPI May 11, 2026 08:44 Inactive

Julian reviewed May 12, 2026

View reviewed changes

oscmarb added 2 commits May 12, 2026 12:56

Cover distinct NaN cases in uniq and tidy _utils imports.

2271cec

Avoid float("nan") singleton in uniq tests.

1911a74

oscmarb temporarily deployed to PyPI May 12, 2026 11:34 — with GitHub Actions Inactive

Julian merged commit 9f6fc68 into python-jsonschema:main May 12, 2026
87 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up uniqueItems validation with structural hashing#1482

Speed up uniqueItems validation with structural hashing#1482
Julian merged 4 commits into
python-jsonschema:mainfrom
oscmarb:faster-uniqueitems

oscmarb commented May 11, 2026

Uh oh!

Julian May 12, 2026

Uh oh!

oscmarb May 12, 2026

Uh oh!

Julian May 12, 2026

Uh oh!

oscmarb May 12, 2026

Uh oh!

oscmarb May 12, 2026

Uh oh!

Julian May 12, 2026

Uh oh!

Julian commented May 12, 2026

Uh oh!

Julian commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -1,7 +1,14 @@
		from collections.abc import Mapping, MutableMapping, Sequence
		from operator import ne

Uh oh!

Conversation

oscmarb commented May 11, 2026

Summary

Performance

Uh oh!

Julian May 12, 2026

Choose a reason for hiding this comment

Uh oh!

oscmarb May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Julian May 12, 2026

Choose a reason for hiding this comment

Uh oh!

oscmarb May 12, 2026

Choose a reason for hiding this comment

Uh oh!

oscmarb May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Julian May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Julian commented May 12, 2026

Uh oh!

Julian commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants