[Feature Request]: Extract JSON-LD Schema Data from Crawled Webpages #968

aravindkarnam · 2025-04-10T07:21:49Z

aravindkarnam
Apr 10, 2025
Collaborator

What needs to be done?

Currently, we only extract basic metadata/Open Graph tags but miss valuable structured data that follows Schema.org standards, limiting the richness of extracted information.

What problem does this solve?

Currently, we only extract basic metadata/Open Graph tags but miss valuable structured data that follows Schema.org standards, limiting the richness of extracted information.

Target users/beneficiaries

Data analysts, SEO specialists, content researchers, and any developers building applications that need comprehensive structured data from websites.

Current alternatives/workarounds

Users must currently implement custom post-processing to extract JSON-LD data after crawling. This requires additional code, increases processing time, and may lead to inconsistent implementations.

Proposed approach

Parse HTML content for <script type="application/ld+json"> tags, extract and validate the JSON content, and add a new structured_data or json_ld field to the results object containing the parsed schema information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request]: Extract JSON-LD Schema Data from Crawled Webpages #968

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[Feature Request]: Extract JSON-LD Schema Data from Crawled Webpages #968

Uh oh!

aravindkarnam Apr 10, 2025 Collaborator

What needs to be done?

What problem does this solve?

Target users/beneficiaries

Current alternatives/workarounds

Proposed approach

Replies: 0 comments

aravindkarnam
Apr 10, 2025
Collaborator