Skip to content

Modernize the parser onto php-parser 5 + reflection-docblock 6 (PHP 8.2)#1

Open
bordoni wants to merge 20 commits into
masterfrom
modernize/parser
Open

Modernize the parser onto php-parser 5 + reflection-docblock 6 (PHP 8.2)#1
bordoni wants to merge 20 commits into
masterfrom
modernize/parser

Conversation

@bordoni
Copy link
Copy Markdown
Owner

@bordoni bordoni commented May 31, 2026

Summary

Rewrites the PHPDoc parser off the abandoned phpdocumentor/reflection ~3.0 (php-parser v1) stack — which can't parse modern PHP and has left developer.wordpress.org's code reference frozen since Aug 2023 — onto a modern, maintained one:

  • nikic/php-parser 5 (driven directly, so WordPress hook call-sites are still captured)
  • phpstan/phpdoc-parser 2 + phpdocumentor/reflection-docblock 6 (docblocks)
  • PHP 8.2+, PHPUnit 9

phpdocumentor/reflection is removed entirely.

Approach

Driven by a golden-master oracle: the old parser's output was snapshotted on PHP 7.4, then the rewrite reproduces it byte-for-byte, so the array parse_files() hands the importer is unchanged. The WordPress import integration suite then proves the resulting posts/meta/taxonomies are identical end-to-end.

Staged: deps/env bump → pretty printer → File_Reflector (structure) → docblock adapter → hooks + uses → cleanup. Each stage was verified against the oracle before moving on.

Testing

All green on PHP 8.2:

  • Golden master: 16/16 — parser output byte-identical to the old parser
  • WP export + import: 22/22 (125 assertions) — importer output unchanged
  • Unit: 12/12 (54 assertions) — pretty printer, name resolution, docblock adapter, File_Reflector

Real-world: wp parser export over the full WordPress wp-includes (1,043 files) runs clean — 3,409 functions, 690 classes, 6,247 methods, 2,427 hooks, zero fatals. (The old parser fataled on modern syntax such as nullable types and enums.)

Notes

  • Interfaces/traits are not exported — this matches the legacy contract (runner.php only reads getClasses()); can be added later if DevHub wants it.
  • scribu/lib-posts-to-posts emits dynamic-property deprecations on PHP 8.2 (unmaintained dependency; out of scope here).
  • CI now runs the PHPUnit suite on the PHP 8.2 / 8.3 / 8.4 matrix.

AI disclosure

These changes were generated primarily by an AI coding assistant (Claude Code), working stage by stage under human direction. Following the spirit of WordPress#247's disclosure: the code is AI-authored and reviewers should scrutinise the diff accordingly.

What distinguishes it from a raw AI rewrite is the verification bar it was held to — every stage had to reproduce the old parser's output byte-for-byte against a golden-master oracle, and the WordPress import suite, unit suite, and a full wp-includes export all pass (see Testing). That catches behavioural drift the way a human review of 4,000 lines of generated code realistically cannot, but it does not replace human judgement on architecture and edge cases.


Relates to WordPress#247 — several legacy-parity fixes here (FQN forms, file constants/includes, file-docblock attribution, and namespaced name resolution) were surfaced while reviewing that PR's approach and the maintainers' feedback on it.

bordoni added 11 commits May 30, 2026 18:33
- Add bin/generate-golden.php to snapshot parser output as JSON
- Add WordPress-free PHPUnit golden test + standalone bootstrap/config
- Share corpus discovery and normalization via tests/golden/golden.php
- Cover all tests/source/*.php and tests/**/*.inc fixtures (16 entries)

Pins parse_files() output key-for-key so the parser can be rewritten
onto the modern stack without silently changing the importer's contract.
- Add 16 JSON snapshots of the old parser's parse_files() output
- Add bin/generate-golden-docker.sh to reproduce them on php:7.4-cli
- Document the validated PHP 7.4 generation recipe

The old stack emits warnings on PHP 8.x, so the baseline is generated
on 7.4 and frozen as the regression oracle for the rewrite.
- Update yoast/phpunit-polyfills ^1.0 -> ^1.1 (1.0.3 => 1.1.5)
- Required by the modern WordPress PHPUnit suite (needs Polyfills >= 1.1.0)

Fixes the 'Version mismatch detected for the PHPUnit Polyfills' fatal so
the export + import suite runs green. Parser deps left untouched. Addresses WordPress#244.
- How to run the export+import suite via wp-env (npm scripts or npx)
- Record the baseline: 22 tests / 125 assertions on PHP 7.4
- Note the GitHub HTTPS->SSH git-rewrite gotcha and the per-process fix
- Drop phpdocumentor/reflection ~3.0; add nikic/php-parser ^5,
  phpstan/phpdoc-parser ^2, reflection-docblock ^6, type-resolver ^2
- Bump php to >=8.2, phpunit to ^9, polyfills to ^2; pin resolve
  platform to php 8.2 for reproducible locks
- .wp-env.json to PHP 8.2; CI matrix to 8.2/8.3/8.4; phpunit.xml.dist
  to the v9 schema (coverage/include)
- Guard parser-dependent tests to skip (not fatal) while File_Reflector
  is rewritten in Stage 4

The old parser is intentionally non-functional until Stage 4; both
suites report skips. The PHP 7.4 golden snapshots are unchanged.
- Pretty_Printer now extends PhpParser\PrettyPrinter\Standard; prettyPrintArg
  mirrors prettyPrintExpr state handling (resetState + handleMagicTokens),
  replacing the removed v1 noIndentToken stripping
- Add WordPress-free unit suite (tests/unit + phpunit-unit.xml.dist) covering
  the pretty printer and NameResolver(replaceNodes:false) foundation
- Unit tests assert the exact hook-name/arg strings frozen in the golden
  snapshots, proving correctness before File_Reflector is wired up

Stage 3 of the parser modernization. 5 tests / 9 assertions green on PHP 8.2.
- Replace the phpDocumentor FileReflector subclass with a NodeVisitorAbstract
  that walks the AST and collects functions, classes, methods, properties and
  arguments into thin reflector wrappers (lib/class-reflectors.php)
- Re-expose only the reflector API runner.php consumes, keeping the exported
  array shape identical; collect on leaveNode so nested functions order
  inner-first like the legacy parser; resolve extends/implements to \FQN
- DocBlocks return null and hooks/$uses are deferred (Stages 5-6)
- Add File_Reflector unit tests; structure matches the golden oracle 16/16
  (full golden match pending docblocks + hooks)

Stage 4 of the parser modernization. Unit suite: 8 tests / 33 assertions.
- Add lib/class-docblock-adapter.php wrapping reflection-docblock 6 to emit the
  legacy {description, long_description, tags[]} shape; distinct tag adapters so
  export_docblock's method_exists() probes select the right keys
- type_to_legacy_strings() maps type-resolver 2 types to string[] (\WP_Post,
  int[], unions); long_description reproduced via Parsedown block parsing
- Reconstruct @see (InvalidTag) and @link to match the loose legacy parsing
- Wire getDocBlock() through every reflector; detect the file-level docblock
  (first-docblock heuristic incl. the open-tag-adjacency quirk)
- Add docblock adapter unit tests

Stage 5. Golden: 7/16 full, 16/16 structure+docblocks (hooks/uses pending).
Unit suite: 11 tests / 50 assertions.
- Rewrite the four call/hook reflectors onto php-parser 5 (Function_Call,
  Method_Call, Static_Method_Call, Hook): restore the WP-globals class map,
  self/parent/$this resolution, the full hook-type switch, name cleanup, arg shift
- File_Reflector records hooks (do_action/apply_filters + variants) and per-element
  $uses via node attributes (not dynamic props), with the last_doc carry-over for
  undocumented hooks and per-method called-in-class assignment
- Add a Class_Name_Resolver pass that fully-qualifies class-position names so a
  nested Class::m() caller prints as \Class::m() while function names stay
  unqualified, matching the legacy php-parser 1 output
- Docblock adapter: reconstruct @param/@var InvalidTags (e.g. $this) with type
  resolution via the docblock context
- PHPUnit 9: assertInternalType -> assertIsArray in export-testcase

Stage 6 - parser rewrite complete. Golden 16/16, WP 22/22, unit 11/50 all green.
- Remove the obsolete 'use phpDocumentor\Reflection\*' imports from runner.php
  (those reflectors are gone); update stale PHPDoc type hints to the new wrappers
- Remove the now-permanent parser_is_functional() skip guards from the golden
  and WP suites; the parser is always loadable, so the tests always run

Stage 7 cleanup. Golden 16/16, WP 22/22, unit 11/50 still green.
Found by an end-to-end 'wp parser export' of the full WordPress wp-includes
(1043 files): 'new class extends Foo {}' reached the string-cast fallback in
Method_Call_Reflector and fataled. Return '' for a nameless class instead.

Add a regression unit test. Full wp-includes now exports cleanly: 3409
functions, 690 classes, 6247 methods, 2427 hooks, zero fatals.
@bordoni bordoni self-assigned this May 31, 2026
bordoni added 5 commits May 30, 2026 22:46
The plugin bundles scribu/lib-posts-to-posts + scb-framework via Composer, but
.wp-env.json also installed the standalone posts-to-posts plugin. Both load the
scribu framework, so 'wp-env start' fataled during plugin activation (Cannot
redeclare scb_init() / class P2P_Storage not found), failing CI setup before any
test ran. Drop the standalone plugin; the bundled copy provides P2P_Storage
(verified: phpdoc-parser activates clean, WP 22/22 + golden 16/16 + unit 12/12).
CI runs 'wp-env start' (which activates the plugin) before composer installs
vendor/, so P2P_Storage isn't loaded at activation -> 'class P2P_Storage not
found' fatal that failed CI setup. Wrap the activation callbacks in a
class_exists() check; Relationships already creates the P2P tables on demand,
so this is safe. Verified: plugin activates clean with vendor absent.
- Fully-qualify class-name argument typehints (\WP_Post), leaving
  built-in Identifier types (int, string, array) bare. Argument types
  previously dropped the leading backslash the legacy parser emitted.
- Strip reflection-docblock's FQSEN normalization backslash from @see
  references so they read as written, matching the legacy output.

Both are the leading-backslash discrepancy dd32 flagged on upstream
PR WordPress#247 (present in our parser too); verified against the legacy oracle.
- Add type-tags.inc fixture + legacy golden: the WordPress @type hash
  @param stays inline in content and is not extracted, per dd32 and
  johnbillion (the wporg-developer theme depends on this).
- Extend docblocks.inc (additive) with @see references, a typed-hash
  method, and a markdown-heavy description; regenerate the legacy oracle.
- Unit tests lock the modern @param syntaxes the old parser mangled
  (?type, parenthesized unions) and modern code typehints (?WP_Post,
  union, return types) that php-parser v1 could not parse at all.
- Extract file-level constants (define() calls anywhere + the const
  keyword) and include/require statements with their legacy type labels
  ("Include", "Require Once", ...), via new Include_Reflector and
  Constant_Reflector wrappers. File_Reflector previously returned [] for
  both, silently dropping these from the export contract.
- Fix the file-docblock heuristic: a docblock attached to the open tag
  floats to the file only when the first statement does not claim it.
  Hooks, define(), and include/require claim it (as the legacy parser
  does); only plain calls and assignments leave it for the file. The old
  check missed bare hooks/define()/require, mis-attributing their
  docblock to the file on real wp-load.php-style files.
- Lock both with a golden fixture (constants-includes.inc, minted from
  the legacy parser on PHP 7.4) and unit tests.

Found while mining upstream PR WordPress#247 for export-contract gaps.
@bordoni bordoni force-pushed the modernize/parser branch from a681455 to 9b93734 Compare May 31, 2026 04:45
bordoni added 4 commits May 31, 2026 01:13
Mining upstream PR WordPress#247: probed two contract corners our corpus never
exercised; both already match the legacy parser byte-for-byte.

- hooks-extra: all 6 hook variants (action/filter x ref_array/deprecated),
  argument shapes, and hook-name normalization (concat -> interpolation).
- class-features: abstract/final, extends + multiple implements, method
  aliases, multi-property declarations, and trait/interface exclusion.
…parser

Mining upstream PR WordPress#247 surfaced four name-resolution gaps in namespaced
code (which in wp-includes means the bundled SimplePie/Requests/PHPMailer
libraries dd32 diffed):

- Method namespace now reports its enclosing namespace (My\Plugin), not ''
  — '' is correct only for the global namespace.
- Exported namespace aliases on functions and methods are fully-qualified
  ("\Other\Thing"), matching the legacy output (dd32's leading-backslash note).
- Function-use names resolve like the legacy parser: unqualified global-
  fallback calls stay bare (count), while fully-qualified, qualified, and
  use-function-imported calls become a leading-backslash FQN (\do_action,
  \Other\helper, \My\Plugin\Sub\thing). The previous code read the wrong
  resolver attribute and left them all bare.
- A fully-qualified \do_action is a plain function call, not a hook.

Locked with a golden fixture (namespaced-uses.inc, minted from the legacy
parser on PHP 7.4) and a unit test.
Mining upstream PR WordPress#247: calls inside closures (anonymous functions used
as hook callbacks, assigned to variables, or nested in functions) attribute
to the enclosing named scope (file or function), not the closure — already
matching the legacy parser byte-for-byte.
- Add @var and method docblocks to Include_Reflector and Constant_Reflector.
- Document the method namespace rule (global reported as '') and the
  fully-qualified alias export on the function/method accessors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant