I am once again having trouble with SPARQL. First off, a bit of background information: I have been using the Wikidata Query Service to retrieve data from Wikidata. As the Wikidata SPARQL endpoint is limited and timeouts occur for large tasks, I figured I would:
- Split the queries into several smaller ones
- Download them as .csv
- Convert them to .nt N-Triples
- Import them into Cliopatra (which uses SWI-Prolog)
- Use the built-in YASGUI SPARQL Editor to query the data locally
As of now, the query works in the Wikidata Query Service. However, locally I am not getting the OPTIONAL function to work.
My Wikidata code is as follows (this is a smaller selection of all data I want to retrieve):
SELECT ?q ?GTAA_ID ?pseudonym ?date_of_death
(group_concat(DISTINCT ?occupationLabel;separator=", ") as ?Occupations )
WHERE{
?q wdt:P1741 ?GTAA_ID.
OPTIONAL {?q wdt:P742 ?pseudonym.}
OPTIONAL {?q wdt:P570 ?date_of_death.}
OPTIONAL {?q wdt:P106 ?occupation.}
SERVICE wikibase:label { bd:serviceParam wikibase:language "nl".
?occupation rdfs:label ?occupationLabel.}
}
GROUP BY ?q ?GTAA_ID ?pseudonym ?date_of_death
and this correctly retrieves:
q | GTAA_ID | pseudonym | date_of_death | occupation
Q3295087 | 102376 | | 2000-11-05 | acteur
Q2800419 | 89301 | | | politicus, staatsman
and so on
The point here being that it allows me to select all results that have a Wikidata ID and a GTAA ID and corresponding pseudonym, date_of_death and occupations (if available). Furthermore, if a person has multiple occupations, it separates them by a ',' and places them in the same row.
However, as stated above I downloaded the files to be able to query them locally. To do this I converted the .csv files to .nt with the following format:
<?s> <?p> "?o"
where the object is a string. Note that in the following examples, the ?p is correctly used in the way I converted to .nt. (Therefore the PREFIX ps is used) Loaded them into Cliopatria and used the following code in the YASGUI editor:
PREFIX ps: <http://www.wikidata.org/prop/statement/>
SELECT ?q ?GTAA_ID ?date_of_death ?pseudonym
(group_concat(DISTINCT ?occupation;separator=", ") as ?occupations )
WHERE{
?q ps:P1741 ?GTAA_ID.
OPTIONAL{?q ps:P742 ?pseudonym.}
OPTIONAL{?q ps:P106 ?occupation.}
OPTIONAL{?q ps:P570 ?date_of_death.}
}
GROUP BY ?q ?GTAA_ID ?date_of_death ?pseudonym
However, in this query, ?pseudonym ?occupation and ?date_of_death are optional, but the occupations are not concatenated into a single row. Query 1
If I replace the GROUP BY function with
GROUP BY ?q ?GTAA_ID
it does not display ?pseudonym and ?date_of_death at all, but does concatenate ?occupation. Query 2
If I replace the GROUP BY function with
GROUP BY ?q ?GTAA_ID ?date_of_death
it concatenates only the ?occupation for a ?q that has a ?date_of_death. Those without a ?date_of_death are not concatenated into 1 row. Furthermore, it does not display any ?pseudonym at all. Query 3
I suspect is has to do with the GROUP BY function in combination with the group_concat function. However, I do not understand why it works in the Wikidata Query Service but not on my localhost. The locally used .nt file can be accessed here
Many thanks in advance!