Skip to content

Block delimiter: add new Scanner class. #44158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 10, 2025
Merged

Conversation

jeherve
Copy link
Member

@jeherve jeherve commented Jul 1, 2025

Proposed changes:

This adds a new class to the package. See matching PRs:

  • 185532-ghe-Automattic/wpcom
  • 185783-ghe-Automattic/wpcom

The Block_Delimiter class introduced next_delimiter() and scan_delimiters(), which made it possible to parse the block structure in a document in a memory-efficient way. Unfortunately, fundamental choices for the interface, namely returning a new class instance on every block delimiter and relying on a generator function, limited the CPU performance fronteir of that class as a replacement for parse_blocks().

This new class introduces Block_Scanner, more directly-modeled after the HTML API and informed by refactors incorporating Block_Delimiter. This class mutates itself and requires a new instance before scanning. The tradeoff is that it’s much faster running while maintaining the same near-zero memory overhead.

A new class is introduced due to the scale of change in the interface and in order to provide seamless refactoring of code already relying on scan_delimiters().

Other information:

  • Have you written new tests for your changes, if applicable?
  • Have you checked the E2E test CI results, and verified that your changes do not break them?
  • Have you tested your changes on WordPress.com, if applicable (if so, you'll see a generated comment below with a script to run)?

Jetpack product discussion

Does this pull request change what data or activity we track or use?

  • No

Testing instructions:

  • Is CI happy?

dmsnell and others added 4 commits July 1, 2025 11:44
The `Block_Delimiter` class introduced `next_delimiter()` and
`scan_delimiters()`, which made it possible to parse the block structure in a document in a memory-efficient way. Unfortunately, fundamental
choices for the interface, namely returning a new class instance on
every block delimiter and relying on a generator function, limited the CPU performance fronteir of that class as a replacement for `parse_blocks()`.

This new class introduces `Block_Scanner`, more directly-modeled after the HTML API and informed by refactors incorporating `Block_Delimiter`.
This class mutates itself and requires a new instance before scanning.
The tradeoff is that it’s much faster running while maintaining the same near-zero memory overhead.

A new class is introduced due to the scale of change in the interface
and in order to provide seamless refactoring of code already relying
on `scan_delimiters()`.
@jeherve jeherve requested a review from Copilot July 1, 2025 10:22
@jeherve jeherve self-assigned this Jul 1, 2025
@jeherve jeherve added [Type] Enhancement Changes to an existing feature — removing, adding, or changing parts of it [Status] In Progress [Pri] Normal labels Jul 1, 2025
Copy link
Contributor

github-actions bot commented Jul 1, 2025

Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.

  • To test on WoA, go to the Plugins menu on a WoA dev site. Click on the "Upload" button and follow the upgrade flow to be able to upload, install, and activate the Jetpack Beta plugin. Once the plugin is active, go to Jetpack > Jetpack Beta, select your plugin (Jetpack), and enable the add/block-scanner-delimiter branch.
  • To test on Simple, run the following command on your sandbox:
bin/jetpack-downloader test jetpack add/block-scanner-delimiter

Interested in more tips and information?

  • In your local development environment, use the jetpack rsync command to sync your changes to a WoA dev blog.
  • Read more about our development workflow here: PCYsg-eg0-p2
  • Figure out when your changes will be shipped to customers here: PCYsg-eg5-p2

Copy link
Contributor

github-actions bot commented Jul 1, 2025

Thank you for your PR!

When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:

  • ✅ Include a description of your PR changes.
  • ✅ Add a "[Status]" label (In Progress, Needs Review, ...).
  • ✅ Add a "[Type]" label (Bug, Enhancement, Janitorial, Task).
  • ✅ Add testing instructions.
  • ✅ Specify whether this PR includes any changes to data or privacy.
  • ✅ Add changelog entries to affected projects

This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖


Follow this PR Review Process:

  1. Ensure all required checks appearing at the bottom of this PR are passing.
  2. Make sure to test your changes on all platforms that it applies to. You're responsible for the quality of the code you ship.
  3. You can use GitHub's Reviewers functionality to request a review.
  4. When it's reviewed and merged, you will be pinged in Slack to deploy the changes to WordPress.com simple once the build is done.

If you have questions about anything, reach out in #jetpack-developers for guidance!

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new high-performance, mutable Block_Scanner class as a replacement for the legacy Block_Delimiter, updates tests to cover it, and adds stubs, documentation, and changelog entries to support it.

  • Add WP_HTML_Span stub and include it in test bootstrap
  • Implement Block_Scanner and extensive PHPUnit tests
  • Update README and changelog to describe and document the new scanner

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/stubs/class-wp-html-span.php Add stub for WP_HTML_Span
tests/php/bootstrap.php Load the new stub in test bootstrap
tests/php/Block_Scanner_Test.php New PHPUnit tests covering all scanner behaviors
src/class-block-scanner.php Implementation of Block_Scanner
changelog/add-block-scanner-delimiter Changelog entry for the added scanner
README.md Document new Block_Scanner and legacy Block_Delimiter
Comments suppressed due to low confidence (1)

projects/packages/block-delimiter/src/class-block-scanner.php:235

  • The docblock for next_delimiter() describes support for a $freeform_blocks parameter, but the implementation currently ignores it. Please update the documentation to note that freeform scanning is not yet implemented or implement the parameter behavior to match the docs.
	public function next_delimiter( string $freeform_blocks = 'skip' ): bool { // phpcs:ignore VariableAnalysis.CodeAnalysis.VariableAnalysis.UnusedVariable

Copy link

jp-launch-control bot commented Jul 1, 2025

Code Coverage Summary

1 file is newly checked for coverage.

File Coverage
projects/packages/block-delimiter/src/class-block-scanner.php 161/229 (70.31%) 💚

Full summary · PHP report · JS report

@jeherve jeherve added [Status] Needs Review This PR is ready for review. and removed [Status] In Progress labels Jul 1, 2025
@jeherve jeherve requested review from kraftbj and dmsnell July 1, 2025 10:55
@kraftbj kraftbj added [Status] Ready to Merge Go ahead, you can push that green button! and removed [Status] Needs Review This PR is ready for review. labels Jul 10, 2025
@jeherve jeherve merged commit bc48734 into trunk Jul 10, 2025
70 checks passed
@jeherve jeherve deleted the add/block-scanner-delimiter branch July 10, 2025 16:16
@github-actions github-actions bot removed the [Status] Ready to Merge Go ahead, you can push that green button! label Jul 10, 2025
Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience, as I’m late to get to this.

break;
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for encouraging more standardized use, we could update this to pass the block type into opens_block()

while ( $scanner->next_delimiter() ) {
	if ( ! $scanner->opens_block( 'image' ) ) {
		continue;
	}

	…
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the docs in #44365

[1] => core/image
[2] => core/list
[3] => core/paragraph
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: this example is “Counting block types” but never counts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the docs in #44365

}

$json_span = substr( $this->source_text, $this->json_at, $this->json_length );
$parsed = json_decode( $json_span, true, 512, JSON_OBJECT_AS_ARRAY | JSON_INVALID_UTF8_SUBSTITUTE );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two things here:

  • the true introduces needless confusion here and should be null
  • technically the JSON_INVALID_UTF8_SUBSTITUTE introduces a change in behavior between this parser and the spec/default parser in Core. it will return a transformed JSON object whereas in the default parser it will return no attributes — null or empty array.

Not sure these points matter to Jetpack or not, but they stand out so I wanted to note them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #44365 to revert that change.

/**
* Include WordPress HTML API stubs.
*/
require_once __DIR__ . '/../stubs/class-wp-html-span.php';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are Core WordPress files not already loaded here? This has been in Core since 6.2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're not. The package's tests are completely independent from WordPress.

@dmsnell
Copy link
Member

dmsnell commented Jul 17, 2025

@copilot you should apologize for trying to sneak in a bug

@jeherve
Copy link
Member Author

jeherve commented Jul 18, 2025

@dmsnell Thanks for the review. I've opened #44365 to bring in your suggestions.

jeherve added a commit that referenced this pull request Jul 18, 2025
…mentation (#44365)

* Scanner: revert change from #44158

See #44158 (comment)

* Update docs

#44158 (comment)

* Have the counting example actually count

See #44158 (comment)

* Add changelog

---------

Co-authored-by: Dennis Snell <[email protected]>
matticbot pushed a commit to Automattic/block-delimiter that referenced this pull request Jul 18, 2025
…mentation (#44365)

* Scanner: revert change from #44158

See Automattic/jetpack#44158 (comment)

* Update docs

Automattic/jetpack#44158 (comment)

* Have the counting example actually count

See Automattic/jetpack#44158 (comment)

* Add changelog

---------

Co-authored-by: Dennis Snell <[email protected]>

Committed via a GitHub action: https://github.com/Automattic/jetpack/actions/runs/16374699829

Upstream-Ref: Automattic/jetpack@d44540d
matticbot pushed a commit to Automattic/jetpack-storybook that referenced this pull request Jul 18, 2025
…mentation (#44365)

* Scanner: revert change from #44158

See Automattic/jetpack#44158 (comment)

* Update docs

Automattic/jetpack#44158 (comment)

* Have the counting example actually count

See Automattic/jetpack#44158 (comment)

* Add changelog

---------

Co-authored-by: Dennis Snell <[email protected]>

Committed via a GitHub action: https://github.com/Automattic/jetpack/actions/runs/16374699829

Upstream-Ref: Automattic/jetpack@d44540d
matticbot pushed a commit to Automattic/jetpack-production that referenced this pull request Jul 18, 2025
…mentation (#44365)

* Scanner: revert change from #44158

See Automattic/jetpack#44158 (comment)

* Update docs

Automattic/jetpack#44158 (comment)

* Have the counting example actually count

See Automattic/jetpack#44158 (comment)

* Add changelog

---------

Co-authored-by: Dennis Snell <[email protected]>

Committed via a GitHub action: https://github.com/Automattic/jetpack/actions/runs/16374699829

Upstream-Ref: Automattic/jetpack@d44540d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs [Package] Block Delimiter [Pri] Normal [Tests] Includes Tests [Type] Enhancement Changes to an existing feature — removing, adding, or changing parts of it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants