1

I have no idea to how to compare different labels without taking accents into account.

The next query doesn't return the place because "Ibáñez" has accents in Spanish DBpedia, but it has different accents in my data source.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>

SELECT DISTINCT ?iri

WHERE {

  ?iri rdfs:label ?label .
  ?label  bif:contains  "'Blasco Ibañez'" .

  ?iri ?location ?city .
  FILTER (?location = <http://dbpedia.org/ontology/location> ||  <http://dbpedia.org/ontology/wikiPageWikiLink>) .
  ?city bif:contains "valencia" 

} LIMIT 100

Is there a way to not to take account of the accents?

TallTed
  • 9,069
  • 2
  • 22
  • 37
  • 1
    [Try this query](https://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+DISTINCT+%3Firi%0D%0AWHERE+{%0D%0A++%3Firi+rdfs%3Alabel+%3Flabel+.%0D%0A++%3Flabel++bif%3Acontains++"'Blasco+Ibanez'"+.%0D%0A%0D%0A++%3Firi+%3Flocation+%3Fcity+.%0D%0A++FILTER+(%3Flocation+%3D++||++)+.%0D%0A++%3Fcity+bif%3Acontains+"valencia"+}+limit+100). See also [this article](http://docs.openlinksw.com/virtuoso/virtuosotipsandtrickscontrolunicode3/). – Stanislav Kralin Jul 06 '17 at 10:15
  • Your query and my query are the same. I can't find the difference between both. Anyway, the particular DBpedia data page that I want it is only in the spannish version http://es.dbpedia.org/sparql – Jesús Ibáñez Jul 06 '17 at 11:21
  • On dbpedia.org, your query retutrns nothing, my query returns 3 results. On es.dbpedia.org, [this query](http://es.dbpedia.org/sparql?default-graph-uri=&query=SELECT+DISTINCT+%3Firi%0D%0A%0D%0AWHERE+{%0D%0A%0D%0A++%3Firi+rdfs%3Alabel+%3Flabel+.%0D%0A++%3Flabel++bif%3Acontains++"'Blasco+Ibáñez'"+.%0D%0A%0D%0A++%3Firi+%3Flocation+%3Fcity+.%0D%0A++FILTER+(%3Flocation+%3D++||++)+.%0D%0A++%3Fcity+bif%3Acontains+"valencia"+%0D%0A}+limit+100) returns 4 results – Stanislav Kralin Jul 06 '17 at 11:30
  • All the differences between the queries are in accents in "Blasco Ibanez". – Stanislav Kralin Jul 06 '17 at 11:33
  • 1
    [This page](https://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&qtxt=SELECT+DISTINCT+%3Firi%0D%0AWHERE+{%0D%0A++%3Firi+rdfs%3Alabel+%3Flabel+.%0D%0A++%3Flabel++bif%3Acontains++"'Blasco+Ibanez'"+.%0D%0A%0D%0A++%3Firi+%3Flocation+%3Fcity+.%0D%0A++FILTER+(%3Flocation+%3D++||++)+.%0D%0A++%3Fcity+bif%3Acontains+"valencia"+}+limit+100) may make it easier to compare @StanislavKralin's query to your original – TallTed Jul 07 '17 at 13:33

1 Answers1

2

The issue is the current configuration of the Spanish DBpedia endpoint. (You may find the query I used to check their configuration interesting.)

Their virtuoso.ini must be adjusted to include --

[I18N]
XAnyNormalization=3

-- as described in the documentation of the INI file, and as further discussed in the article about "normalization of UNICODE3 accented chars in free-text index and queries", as cited in comments by @StanislavKralin.

(Note -- as of this writing, there's a typo in the doc; the section about "WideFileNames = 1/2/3/0" should say it's about "XAnyNormalization = 1/2/3/0")

TallTed
  • 9,069
  • 2
  • 22
  • 37