0

I am running the following sparql query to DBpedia as I build a tree of a company hierarchy:

def get_result(sparql, parent_company):
    sparql = SPARQLWrapper('https://dbpedia.org/sparql')
    sparql.setQuery(f'''
        SELECT ?name
        WHERE {{?name dbo:parentCompany dbr:{parent_company}}}
    ''')
    sparql.setReturnFormat(JSON)
    gdata = sparql.query().convert()
    ...

I run this for each company in the organizational structure to see if they have any child companies. For larger companies (e.g., parent_company = Microsoft), this can be about 30 queries. I timed each query and most are < 1 second, but about every 5th query, it runs in about 1 min 7 secs. DBPedia's website says that it should handle up to 100 requests per second per IP address. Any idea what is causing this?

nholland
  • 1
  • 1
  • 1
    it could be some blocking nevertheless, either from the HTTP server or the backend. My suggestion, why aren't you doing a more batch like query? In SPARQL you can pass multiple values as inline data via `VALUES` keyword. For example, `SELECT ?childCompany ?parentCompany WHERE { VALUES ?parentCompany {dbr:Microsoft dbr:Apple_Inc\.} ?childCompany dbo:parentCompany ?parentCompany }` – UninformedUser Apr 27 '22 at 04:25
  • Or you could also try to get the whole hierarchy for a single company via a single SPARQL query – UninformedUser Apr 27 '22 at 04:26
  • @UninformedUser - do you have a suggestion for how I could do that? I am very new to sparql and it has a very steep learning curve so I've been piecing together what I can for my current project. – nholland Apr 28 '22 at 17:52
  • depends on how you need the data, but in the end the rest is client side work anyways: `SELECT ?child ?parent WHERE { ?parent dbo:parentCompany* dbr:Microsoft . ?child dbo:parentCompany ?parent }` - this will give you all child-parent pairs such that Microsoft will be at the end of the chain - would this be sufficient for you? – UninformedUser May 01 '22 at 06:40
  • Oh wow - yes. That is very helpful. I will take a look at that and see if I can get the results into the dictionary format that I need for my front end. Thanks! – nholland May 02 '22 at 21:23
  • @UninformedUser - is there a way to return only the first parent listed with this type of query? Unfortunately, dbpedia still lists companies that used to own a company so I am getting results that are no longer true. – nholland May 29 '22 at 23:57

1 Answers1

0

The DBpedia SPARQL endpoint is a fully public service, so you may well be running into delays caused by other queries from other users. You might consider spinning up your own instance of DBpedia Snapshot, if you require predictably rapid responses.

This article may help you understand how DBpedia serves ad hoc queries on a worldwide basis.

TallTed
  • 9,069
  • 2
  • 22
  • 37
  • I've tested this at many times of day and it is consistently throttling every 4-5 query by 1.06 seconds. The consistency leads me to think it is not due to other users unless dbpedia is consistently overloaded. This is for a school project (and one I'd like to add to my overall portfolio) so I won't be able to spin up my own Snapshot. – nholland Apr 28 '22 at 17:55
  • @nholland -- In that case, I might suggest you raise your question on the [DBpedia Community Forum](https://forum.dbpedia.org/c/support/query-dbpedia-sparql/16), and later, [post your project there](https://forum.dbpedia.org/c/dbpedia-in-use/44) – TallTed Apr 29 '22 at 14:07