[Feature Request]: Extract JSON-LD Schema Data from Crawled Webpages #968
aravindkarnam
started this conversation in
Feature requests
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What needs to be done?
Currently, we only extract basic metadata/Open Graph tags but miss valuable structured data that follows Schema.org standards, limiting the richness of extracted information.
What problem does this solve?
Currently, we only extract basic metadata/Open Graph tags but miss valuable structured data that follows Schema.org standards, limiting the richness of extracted information.
Target users/beneficiaries
Data analysts, SEO specialists, content researchers, and any developers building applications that need comprehensive structured data from websites.
Current alternatives/workarounds
Users must currently implement custom post-processing to extract JSON-LD data after crawling. This requires additional code, increases processing time, and may lead to inconsistent implementations.
Proposed approach
Parse HTML content for <script type="application/ld+json"> tags, extract and validate the JSON content, and add a new structured_data or json_ld field to the results object containing the parsed schema information.
Beta Was this translation helpful? Give feedback.
All reactions