3

I created the following SPARQL query to Wikidata. And the result of this query are records related to states in Germany. But as you can see, results are occurring four times in a row (you can test it here: https://query.wikidata.org/). I supposed that there is a problem with geo coordinates and languages but I can't resolve it anyway. What is wrong with this query and how can I fix it to receive a result without repetition?

PREFIX  p:    <http://www.wikidata.org/prop/>
PREFIX  schema: <http://schema.org/>
PREFIX  psv:  <http://www.wikidata.org/prop/statement/value/>
PREFIX  wdt:  <http://www.wikidata.org/prop/direct/>
PREFIX  wikibase: <http://wikiba.se/ontology#>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  wd:   <http://www.wikidata.org/entity/>

SELECT DISTINCT  ?subject ?featureCode ?countryCode ?name ?latitude ?longitude ?description ?iso31662
WHERE
  { ?subject  wdt:P31     wd:Q1221156 ;
              rdfs:label  ?name ;
              wdt:P17     ?countryClass .
    ?countryClass
              wdt:P297    ?countryCode .
    ?subject wdt:P31/(wdt:P279)* ?adminArea .
    ?adminArea  wdt:P2452  "A.ADM1" ;
              wdt:P2452  ?featureCode .
    ?subject  wdt:P300   ?iso31662
    OPTIONAL
      { ?subject  schema:description  ?description
        FILTER ( lang(?description) = "en" )
        ?subject  p:P625                ?coordinate .
        ?coordinate  psv:P625           ?coordinateNode .
        ?coordinateNode
                  wikibase:geoLatitude  ?latitude ;
                  wikibase:geoLongitude  ?longitude
      }
    FILTER ( lang(?name) = "en" )
    FILTER EXISTS { ?subject  wdt:P300  ?iso31662 }
  }
ORDER BY lcase(?name)
OFFSET  0
LIMIT   200
logi-kal
  • 7,107
  • 6
  • 31
  • 43
chuckk
  • 33
  • 6
  • It happens once you add the geo related information via the `OPTIONAL` clause. To me, it looks like a bug in the Blazegraph triple store. – UninformedUser Feb 27 '18 at 18:15
  • @AKSW, this happens even without `OPTIONAL`. The reason is that coordinates are duplicated in some sense: try `DESCRIBE wdv:180798f520e60c501432b23634473082` for example. – Stanislav Kralin Feb 27 '18 at 19:45
  • I just meant the content of the OPTIONAL. Ok, so it looks like the manually curated data is even rubbish. that's bad...the question now: is this caused bythe triple store or does somebody in the wikidata project decided to add both datatypes. I don't see any benefit from having literals with both datatypes. – UninformedUser Feb 28 '18 at 04:16
  • Possibly related: https://phabricator.wikimedia.org/T179228. – Stanislav Kralin Feb 28 '18 at 07:06

1 Answers1

6

In short, "9.0411111111111"^^xsd:double and "9.0411111111111"^^xsd:decimal are distinct, though they might be equal in some sense.

Check this:

SELECT DISTINCT ?subject ?featureCode ?countryCode ?name ?description ?iso31662
    (datatype(?latitude) AS ?lat)
    (datatype(?longitude) AS ?long)  

and this:

SELECT DISTINCT ?subject ?featureCode ?countryCode ?name ?description ?iso31662
    (xsd:decimal(?latitude) AS ?lat)
    (xsd:decimal(?longitude) AS ?long)  
Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
  • @chuckk, do you understand the answer? I hope you are familiar with the [Wikidata data model](https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Data_model), in particular, with [full values](https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Globe_coordinate). – Stanislav Kralin Mar 01 '18 at 14:57
  • I understand basics but I wasn't familiar with these links, thanks! – chuckk Mar 02 '18 at 08:14
  • @chuckk, OK. In your case, `?coordinate` is a *statement* (bindings have the `wds:` prefix) and `?coordinateNode` is a *full value* (bindings have the `wdv:` prefix). *Simple values* are linked to subjects via *truthy* properties which have the `wdt:` prefix. For globe coordinates, simple values are WKT-literals. – Stanislav Kralin Mar 02 '18 at 08:37