Selective PDF Page Conversion in Docling Based on Textual Content Matching #1801
Unanswered
Omar-M-Abdelbary
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm currently working on a workflow where I only need to convert specific pages from a PDF document using Docling, based on whether they contain certain predefined headers.
To achieve this, I need a way to access the textual content of each page individually as a variable before conversion, so I can check if any of my target keywords are present. If a page contains one of the desired headers, it should be processed and converted; otherwise, it should be skipped.
I've explored DocumentConverter.convert() and noticed there's a page_range parameter that can limit the conversion to specific pages. However, my challenge is determining which pages to include in advance — based on the actual text content of each page — before performing the final conversion.
Could you please guide me on the recommended way to:
Beta Was this translation helpful? Give feedback.
All reactions