Skip to content

Dump extraction module fails due to empty ontology.xml  #753

Open
@archiexdex

Description

@archiexdex

Where did the problem occur?

The file ontology.xml downloaded from dbpedia API is empty.

Problem description

When we tried to dump the ontology from dbpedia API (https://mappings.dbpedia.org/api.php), we got an parsing error. The error began from 2023/07. We also found our error was associated with the Github Action fail. link
image

The problem is that the ontology.xml downloaded from dbpedia API is empty. That's why we get the parsing error.

Here is the part of our error message.

INFO: Loading ontology pages
Exception in thread "main" java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.dbpedia.extraction.mappings.CompositeParseExtractor$$anonfun$2.apply(CompositeParseExtractor.scala:73)
>   at org.dbpedia.extraction.mappings.CompositeParseExtractor$$anonfun$2.apply(CompositeParseExtractor.scala:73)
>   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at org.dbpedia.extraction.mappings.CompositeParseExtractor$.load(CompositeParseExtractor.scala:73)
>   at org.dbpedia.extraction.dump.extract.ConfigLoader.org$dbpedia$extraction$dump$extract$ConfigLoader$$createExtractionJob(ConfigLoader.scala:124)
>   at org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$1.apply(ConfigLoader.scala:40)
>   at org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$1.apply(ConfigLoader.scala:40)
>   at scala.collection.TraversableViewLike$Mapped$$anonfun$foreach$2.apply(TraversableViewLike.scala:169)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:743)
>   at scala.collection.immutable.RedBlackTree$TreeIterator.foreach(RedBlackTree.scala:468)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.IterableLike$$anon$1.foreach(IterableLike.scala:310)
>   at scala.collection.TraversableViewLike$Mapped$class.foreach(TraversableViewLike.scala:168)
>   at scala.collection.IterableViewLike$$anon$3.foreach(IterableViewLike.scala:113)
>   at org.dbpedia.extraction.dump.extract.Extraction$.main(Extraction.scala:30)
>   at org.dbpedia.extraction.dump.extract.Extraction.main(Extraction.scala)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
>   at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:204)
>   at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:178)
>   at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:399)
>   at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:326)

Expected behaviour

What did you expect?

The script can download the ontology.xml from dbpedia.

Request/Reproduction

Give the link or request, so the problem can be reproduced. Ideally, this would be a unix curl command.

Follow the document . The problem can be reproduced on mvn clean install .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions