Skip to content

rdfdiff #1987

Open
Open
rdfdiff#1987
@william-vw

Description

@william-vw

Version

4.9.0

What happened?

rdfdiff is not functioning as expected. will issue a PR with suggested fixes & tests.

Relevant output and stacktrace

No response

Are you interested in making a pull request?

Yes

Activity

william-vw

william-vw commented on Aug 15, 2023

@william-vw
Author

see PR #1989

rvesse

rvesse commented on Aug 16, 2023

@rvesse
Member

Bug reports should contain actual steps to reproduce the issue - "not functioning as expected" - is not a useful bug report. Please provide steps, including sample data, that show scenario(s) where two input graphs don't yield the expected diff

afs

afs commented on Aug 16, 2023

@afs
Member

Latest changed to 4.9.0 ... so it is always valid in the future :-)

william-vw

william-vw commented on Aug 16, 2023

@william-vw
Author

Bug reports should contain actual steps to reproduce the issue - "not functioning as expected" - is not a useful bug report. Please provide steps, including sample data, that show scenario(s) where two input graphs don't yield the expected diff

You are right - apologies. I should've waited with this issue until I had enough time to elaborate.

The issues I found:
(lines refer to original code)

  • Any statements with blank nodes are seen as concrete (lines 120, 135). (Node_Blank is a subclass of Node_Concrete, and thus returns true for #isConcrete, which I think is the issue here.)

  • A subgraph from the second set should be added to the candidates list, not the subgraph from first set that is being compared with (line 180).

  • For a statement's first non-encountered bnode, it creates a new subgraph and adds the statement and its forward closure to it (lines 228-235).

However, this does not support cases where multiple nodes end at the same bnode. If you run on testing/cmd/rdfdiff_1.ttl and testing/cmd/rdfdiff_2.ttl (after fixing the above), you see that the "England" triple is ignored as it was not part of the closure of the bnode shared with the "America" triple.

There's a few solutions for this - my original PR implemented one potential solution, but now I'm unsure whether it's the way to go - perhaps adding a different kind of closure would be better.

afs

afs commented on Aug 17, 2023

@afs
Member

Blank nodes are concrete. "concrete" covers the RDF terms that can go in an RDF graph and be printed.

The test "triple.concrete" isn't appropriate (from reading the code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @afs@rvesse@william-vw

        Issue actions

          rdfdiff · Issue #1987 · apache/jena