1

I am trying to get the respective DBPedia entry for a list of companies. I can't figure out how to do approximate matches. Example: "Audi" is called "Audi AG" in DBPedia and "Novartis" is called "Novartis International AG" (foaf:name). How do I search for entries with rdf:type = dbo:Company and name closest to whatever I provide?

I'm using SPARQL as the query language. (But I'm open to change if there is an advantage.)

select ?company
where {
  ?company foaf:name "Novartis"@en.
  ?company a dbo:Company.
}
LIMIT 100

I get no hit but http://dbpedia.org/page/Novartis should be found. Matching the beginning of the name might be good enough to get this.

Mike Dynamite
  • 65
  • 2
  • 8
  • SPARQL is [a language](https://www.w3.org/TR/sparql11-query/), with [many functions](https://en.wikibooks.org/wiki/SPARQL/Expressions_and_Functions#Functions_on_Strings), and DBpedia is hosted on an engine with [many extensions](http://vos.openlinksw.com/owiki/wiki/VOS/VirtTipsAndTricksSPARQL11BuiltInF)... Not to mention built-in tools that can help you build more queries based on one you [build within it](http://dbpedia.org/fct/). – TallTed Apr 16 '19 at 01:50
  • Possible duplicate of [Query for best match to a string with SPARQL?](https://stackoverflow.com/questions/38671325/query-for-best-match-to-a-string-with-sparql) – TallTed Apr 16 '19 at 02:00
  • Why do you think `http://dbpedia.org/resource/Novartis` should be found with your query? The `foaf:name` of this resource is `"Novartis International AG"@en` and only its `rdfs:label` is `"Novartis"@en` - anything beyond exact matching of existing literals in the RDF triples can only be solved by some `FILTER` with one of the string functions (`regex`, `contains`, `strstarts`) or some extended functions not part of the SPARQL 1.1 standard but triple store dependent. – UninformedUser Apr 16 '19 at 03:24
  • 1
    a more complete check for exact match on DBpedia is to consider redirects also known as surface forms or synonyms of resources: `?company ^dbo:wikiPageRedirects?/(rdfs:label|foaf:name) "Novartis"@en.` – UninformedUser Apr 16 '19 at 06:15

1 Answers1

1

For DBpedia, the best option might be to use the bif:contains full-text search pseudo property:

SELECT ?company {
  ?company a dbo:Company.
  ?company foaf:name ?name.
  ?name bif:contains "Novartis"@en.
}

This feature is specific to the Virtuoso database that powers the DBpedia SPARQL endpoint.

If you want to stick to standard SPARQL, to match at the beginning of the name only:

SELECT ?company {
  ?company a dbo:Company.
  ?company foaf:name ?name.
  FILTER strStarts(?name, "Novartis")
}

Unlike the full-text feature, this version cannot make use of a text index, so it is slower.

If you want a more flexible match:

SELECT ?company {
  ?company a dbo:Company.
  ?company foaf:name ?name.
  FILTER contains(lCase(?name), lCase("Novartis"))
}

This will find a case-insensitive match anywhere in the name.

cygri
  • 9,412
  • 1
  • 25
  • 47