3

I am once again having trouble with SPARQL. First off, a bit of background information: I have been using the Wikidata Query Service to retrieve data from Wikidata. As the Wikidata SPARQL endpoint is limited and timeouts occur for large tasks, I figured I would:

  1. Split the queries into several smaller ones
  2. Download them as .csv
  3. Convert them to .nt N-Triples
  4. Import them into Cliopatra (which uses SWI-Prolog)
  5. Use the built-in YASGUI SPARQL Editor to query the data locally

As of now, the query works in the Wikidata Query Service. However, locally I am not getting the OPTIONAL function to work.

My Wikidata code is as follows (this is a smaller selection of all data I want to retrieve):

SELECT ?q ?GTAA_ID ?pseudonym ?date_of_death 
       (group_concat(DISTINCT ?occupationLabel;separator=", ") as ?Occupations )
WHERE{
      ?q wdt:P1741 ?GTAA_ID.
      OPTIONAL {?q wdt:P742 ?pseudonym.}
      OPTIONAL {?q wdt:P570 ?date_of_death.}
      OPTIONAL {?q wdt:P106 ?occupation.}
      SERVICE wikibase:label { bd:serviceParam wikibase:language "nl". 
                            ?occupation rdfs:label ?occupationLabel.}
      }
GROUP BY ?q ?GTAA_ID ?pseudonym ?date_of_death

and this correctly retrieves:

q        | GTAA_ID  | pseudonym | date_of_death | occupation
Q3295087 | 102376   |           | 2000-11-05    | acteur
Q2800419 | 89301    |           |               | politicus, staatsman
and so on

The point here being that it allows me to select all results that have a Wikidata ID and a GTAA ID and corresponding pseudonym, date_of_death and occupations (if available). Furthermore, if a person has multiple occupations, it separates them by a ',' and places them in the same row.

However, as stated above I downloaded the files to be able to query them locally. To do this I converted the .csv files to .nt with the following format:

<?s> <?p> "?o" 

where the object is a string. Note that in the following examples, the ?p is correctly used in the way I converted to .nt. (Therefore the PREFIX ps is used) Loaded them into Cliopatria and used the following code in the YASGUI editor:

PREFIX ps: <http://www.wikidata.org/prop/statement/>

SELECT ?q ?GTAA_ID ?date_of_death ?pseudonym 
       (group_concat(DISTINCT ?occupation;separator=", ") as ?occupations )

WHERE{
  ?q ps:P1741 ?GTAA_ID.
  OPTIONAL{?q ps:P742 ?pseudonym.}
  OPTIONAL{?q ps:P106 ?occupation.}
  OPTIONAL{?q ps:P570 ?date_of_death.}
     } 
GROUP BY ?q ?GTAA_ID ?date_of_death ?pseudonym

However, in this query, ?pseudonym ?occupation and ?date_of_death are optional, but the occupations are not concatenated into a single row. Query 1

If I replace the GROUP BY function with

GROUP BY ?q ?GTAA_ID

it does not display ?pseudonym and ?date_of_death at all, but does concatenate ?occupation. Query 2

If I replace the GROUP BY function with

GROUP BY ?q ?GTAA_ID ?date_of_death

it concatenates only the ?occupation for a ?q that has a ?date_of_death. Those without a ?date_of_death are not concatenated into 1 row. Furthermore, it does not display any ?pseudonym at all. Query 3

I suspect is has to do with the GROUP BY function in combination with the group_concat function. However, I do not understand why it works in the Wikidata Query Service but not on my localhost. The locally used .nt file can be accessed here

Many thanks in advance!

logi-kal
  • 7,107
  • 6
  • 31
  • 43
John
  • 61
  • 3
  • ` Друг Кузьмы Пруткова ,` etc. – Stanislav Kralin Mar 26 '18 at 18:31
  • Unfortunately I have seem to made a mistake when converting to .nt When querying wikidata, it displays the corresponding property as a string. The – John Mar 26 '18 at 18:40
  • Then upload correct file (validate file before uploading) Anyway, why do you think that the problem is related YASGUI? YASGUI just displays results. Do SPARQL results in CSV/XML/JSON satisfy you? – Stanislav Kralin Mar 26 '18 at 18:54
  • What does your SPARQL query give when you use YASGUI to run the query directly on Wikidata endpoint? – Ivo Velitchkov Mar 27 '18 at 05:05
  • That's not a YASGUI issue, just his data... – UninformedUser Mar 27 '18 at 06:09
  • I'm wondering why you didn't use SPARQL CONSTRUCT to directly get RDF triples back? This would avoid the conversion from CSV to RDF data. Just my two cents ... – UninformedUser Mar 27 '18 at 06:11
  • I have updated the .nt file. I am not necessarily saying the issue is related to YASGUI, I just don't understand why my query does work with wikidata query service, but does not work in YASGUI when my local (sub)set corresponds with wikidata data. I also had a period where I was trying to get the relevant data with CONSTRUCT and also DESCRIBE, however my SPARQL expertise is not great and I was not able to retrieve only the relevant data. I am also not able to run the query directly on Wikidata endpoint in YASGUI. – John Mar 27 '18 at 09:52
  • Furthermore, a reason why I am using .csv is because Wikidata allows me to download the file in either JSON, TSV or CSV. The program I am using, Cliopatria, states that it accepts XML/RDF and Turtle. However, it does not seem to accept JSON. Furthermore, when Cliopatria displays results, it only allows me to download them as .csv . This is not an issue for me though, as I will be using the results in Excel. – John Mar 27 '18 at 09:52
  • I have successfully load your dump into Jena Fuseki. First query works fine, perhaps Cliopatria doesn't understand `group_concat` (possibly only with `separator`). In your second query non-grouping variable is used in the projection, try something like `select (sample(?date_of_death) as ?death)` (or use `max`, if Cliopatria doesn't understand `sample`). – Stanislav Kralin Mar 27 '18 at 12:47
  • @John What I wanted to say is that you can also download RDF e.g. as N-Triples - you only have to use SPARQL `CONSTRUCT` instead of `SELECT` Clearly, Cliopatria doesn't support JSON, that'S not an RDF format, thus, you should running SPARQL on it work? – UninformedUser Mar 28 '18 at 08:11
  • In addition to @AKSW's advice: https://stackoverflow.com/a/49350592/7879193. Anyway, the whole data about entities with GTAA_ID is very large, see comments to your initial question. – Stanislav Kralin Mar 28 '18 at 10:53

0 Answers0