Selective PDF Page Conversion in Docling Based on Textual Content Matching #1801

Omar-M-Abdelbary · 2025-06-17T16:33:11Z

Omar-M-Abdelbary
Jun 17, 2025

I'm currently working on a workflow where I only need to convert specific pages from a PDF document using Docling, based on whether they contain certain predefined headers.
To achieve this, I need a way to access the textual content of each page individually as a variable before conversion, so I can check if any of my target keywords are present. If a page contains one of the desired headers, it should be processed and converted; otherwise, it should be skipped.
I've explored DocumentConverter.convert() and noticed there's a page_range parameter that can limit the conversion to specific pages. However, my challenge is determining which pages to include in advance — based on the actual text content of each page — before performing the final conversion.

Could you please guide me on the recommended way to:

Extract or access the raw or structured text content from individual pages before full conversion.

Perform keyword matching on those pages.

Pass only the relevant page numbers to convert() based on that filtering.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Selective PDF Page Conversion in Docling Based on Textual Content Matching #1801

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Selective PDF Page Conversion in Docling Based on Textual Content Matching #1801

Uh oh!

Omar-M-Abdelbary Jun 17, 2025

Replies: 0 comments

Omar-M-Abdelbary
Jun 17, 2025