1

I am trying to run following query on dbpedia approximately 1500 times in a loop with different parameters but it gives me URLError: urlopen error [WinError 10060] error. Sometimes it gives me error after processing 15 records and sometimes it gives me error after 10 or 1,2 etc records. This error is basically thrown at random times.

  """
        
        SELECT DISTINCT ?item ?name ?page WHERE {{
            # VALUES ?groups {{dbo:Person dbo:Location}}
        {{
            # [Case 1] no disambiguation at all (eg. Twitter)
            ?item rdfs:label "{mention_1}"@en .
        }}
        UNION
        {{
            # [Case 1] lands in a redirect page (eg. "Google, Inc." -> "Google")
            ?temp rdfs:label "{mention_1}"@en .
            ?temp dbo:wikiPageRedirects ? ?item .   
        }}
        UNION
        {{
            # [Case 2] a dedicated disambiguation page (eg. Michael Jordan)
            <http://dbpedia.org/resource/{mention_2}_(disambiguation)> dbo:wikiPageDisambiguates ?item.
        }}
        UNION
        {{
            # [Case 3] disambiguation list within entity page (eg. New York)
            <http://dbpedia.org/resource/{mention_2}> dbo:wikiPageDisambiguates ?item .
        }}
        # Filter by entity class
        ?item rdf:type {group} .
        # Grab wikipedia link
        ?item foaf:isPrimaryTopicOf ?page .
        # Get name
        ?item rdfs:label ?name .
        FILTER (langMatches(lang(?name),"en"))
        # ?item rdf:type ?group .
        # ?group rdfs:label ?group_name
        # FILTER (STR(?group_name) IN ("Building", "Airport"))
    }}
    """

I have tried setting timeout to 1000 or 5000 but it did not fix my problem. I received this error even after setting my timeout like this. and also tried implementing retry mechanism on it but none of this worked.

def generate_candidates(mention, group):
    query = build_query(mention, group)
    sparql.setQuery(query)
    sparql.setTimeout(1000)
    for i in range(3):
        try:
            results = sparql.query().convert()
            return results
        except TimeoutError:
            pass

Anusha Ali
  • 13
  • 3
  • you should not set a timeout but some delay between the queries - I mean, you're firing 1500 queries in a short time to a public service - that is flooding a service and can lead to a temporarily being blocked request. – UninformedUser Dec 12 '22 at 08:56
  • @UninformedUser I have tried that. I added time.sleep(5) after each query. But i still received this error. URLError: – Anusha Ali Dec 12 '22 at 10:06
  • `time.sleep(5)` is what, 5 miliseconds? Did you try a greater value? Anyways, if you have a massive workload, then loading the Dbpedia dataset into your own triple store is the only correct way. I mean, also what if DBpedia endpoint is down for a longer period of time? – UninformedUser Dec 13 '22 at 06:47
  • @UninformedUser increasing sleep time did solve the problem. Thanks for this. I will look into triple store too – Anusha Ali Dec 13 '22 at 13:01

0 Answers0