Skip to content

429 Too many requests  #733

Open
Open
@uleodolter

Description

@uleodolter

Hi,

I have configured https://github.com/dbpedia/marvin-config to extract german wikipedia. A first run worked for the 20220401 dump.

Today i run again to extract the 20220601 dump, but it only worked partly the extraction framework and after some time only HTTP 429 was returned from https://de.wikipedia.org/w/api.php.

Exception; de; Main Extraction at 00:00.957s for 62 datasets; Main Extraction failed for instance http://de.dbpedia.org/resource/Liste_von_Autoren/J: Server returned HTTP response code: 429 for URL: https://de.wikipedia.org/w/api.php java.io.IOException: Server returned HTTP response code: 429 for URL: https://de.wikipedia.org/w/api.php at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1902) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1500) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:268) at org.dbpedia.extraction.util.MediaWikiConnector$$anonfun$retrievePage$1.apply$mcVI$sp(MediaWikiConnector.scala:97) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:166) ...

I used the following settings in extractionConfiguration/extraction.de.properties

mwc-apiUrl=https://{{LANG}}.wikipedia.org/w/api.php
mwc-maxRetries=5
mwc-connectMs=4000
mwc-readMs=30000
mwc-sleepFactor=2000

It seems the extraction-framework does not handle this HTTP error properly. I would be great if the Retry-After HTTP header is used to handle such errors. Any suggestions which properties to adjust for this problem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions