Skip to content

gh-148653: Reject hashing incompletely initialized tuples#152321

Closed
vedikatai wants to merge 2 commits into
python:mainfrom
vedikatai:audit/fix-148653-marshal-tuple-hash
Closed

gh-148653: Reject hashing incompletely initialized tuples#152321
vedikatai wants to merge 2 commits into
python:mainfrom
vedikatai:audit/fix-148653-marshal-tuple-hash

Conversation

@vedikatai

Copy link
Copy Markdown

Summary

Fixes python/cpython#148653 (formerly GHSA-m7gv-g5p9-9qqq; PSRT cleared for public): marshal.loads() SIGSEGV on a 16-byte self-referential TYPE_TUPLE|FLAG_REF payload whose nested set element is a TYPE_REF to the incomplete outer tuple.

User impact

Deterministic SIGSEGV (exit -11) on untrusted/malformed marshal data. Documented as insecure against malicious input, but crash is still a robustness bug on 3.9–3.16 labels.

Bisect

Long-standing (issue reports crashes on 3.9–3.14+). Not introduced in 3.16.0a0 specifically; full rebuild bisect omitted (would not pin a 3.16-only commit). Root cause: Python/marshal.c registers the tuple via R_REF before slots are filled; nested set construction hashes the partial tuple → tuple_hashPyObject_Hash(NULL).

Prior attempt: closed PR #148652 (two-phase r_ref_reserve/r_ref_insert). Locally that approach fixed the crash but failed recursive test_marshal cases. This PR uses a narrower guard in tuple_hash instead.

Diff

Objects/tupleobject.c — if item[i] == NULL in tuple_hash, raise ValueError (~8 lines C, well under 50-line budget).

Test results (3.16.0a0)

Reproducer:

import marshal
marshal.loads(b'\xa8\x02\x00\x00\x00N<\x01\x00\x00\x00r\x00\x00\x00\x00')
# pre-fix: SIGSEGV; post-fix: ValueError: cannot hash incompletely initialized tuple

CPython tests: ./python.exe -m test test_marshal test_tuple -q → SUCCESS (recursive marshal structures preserved).

Audit provenance

/tmp/cpython-regression-audit.md.

… path

set_new() used make_new_set(), which GC-tracked the empty set before
set_init() populated it from the iterable. That left the same half-built
window the vectorcall path already closed, so concurrent GC / get_objects()
could observe an inconsistent set and crash on edge cases.

Allocate untracked in set_new() and call _PyObject_GC_TRACK() only after
set_init() succeeds (skipping if already tracked for re-init).
marshal.loads can expose a partially filled FLAG_REF tuple via
TYPE_REF into a set, which called PyObject_Hash(NULL) and SIGSEGV'd.
Guard tuple_hash NULL slots with ValueError instead.
@bedevere-app

bedevere-app Bot commented Jun 26, 2026

Copy link
Copy Markdown

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@python-cla-bot

Copy link
Copy Markdown

The following commit authors need to sign the Contributor License Agreement:

CLA not signed

@vedikatai

Copy link
Copy Markdown
Author

Closing: opened against upstream by mistake. Constraint was all public actions on vedikatai only — reopening as draft PRs on vedikatai/cpython.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash: marshal.loads SIGSEGV on self-referencing TYPE_TUPLE with FLAG_REF

2 participants