Applying group_concat in wikidata queries for participants

Question

I just learned about group_concat and order by, group by, and now I'm trying to apply them to query murder cases. I.E., to this one below to get all participants and targets.

SELECT DISTINCT ?incident ?label ?participant ?participantLabel ?target ?targetLabel
WHERE {
?incident wdt:P31 wd:Q132821.
?incident rdfs:label ?label.
optional{?incident wdt:P710 ?participant.}
optional{?incident wdt:P533 ?target. } }

And I tried to apply group_concat and group by, order by. (Didn't do anything on target below because even this only for participants doesn't work):

SELECT DISTINCT ?incident ?label ?target ?targetLabel (group_concat(?participantLabel; separator=";") as ?participant)

WHERE {
?incident wdt:P31 wd:Q132821.
?incident rdfs:label ?label.
optional{?incident wdt:P710 ?participant.}
optional{?incident wdt:P533 ?target. }}
GROUP BY ?participant ?participantLabel
ORDER BY ?participantLabel

And I am told Query is malformed: Bad aggregate.

Is it because not every case has participants? How can I tackle this?

You have to i) `group by` each variable that you select or ii) at least apply an aggregate function on it. And copy and paste is not the way how you learn SPARQL nor any other thing in computer science. I mean, `GROUP BY ?participant ?participantLabel` - does this make sense to you? You want to get all participants for the incident, or not? So what is making the group here? And where does `?participantLabel` should come from here? You removed the label SERVICE from my previous answer, so that can't work — UninformedUser, Mar 21 '19 at 18:49
Well, I really wish I knew better, AKSW. I'm not a computer science person but was introduced to all these for the first time last month. I think I'm getting it better each time I use it and from feedback here, though, all's still very confusing for me (Didn't even know there was GROUP BY thing in sparql.), so I've been trying to look at some examples from the Internet to learn from, but I'm very often stuck. Stackoverflow people are the only teachers I got and I appreciate it :) — belle, Mar 21 '19 at 20:31
There are many SPARQL tutorials out there. [The spec](https://www.w3.org/TR/sparql11-overview/) can be somewhat daunting, but it is worth reading through, even if you don't understand everything, so you have an idea that things like `GROUP BY`, `ORDER BY`, `LIMIT`, `OFFSET`, and others exist. — TallTed, Mar 21 '19 at 20:54
Thanks for the link, @TallTed, I'd tried one tutorial, learned real basics, then soon it required some background knowledge :/ Guess I should stick to it like you said, even if I don't get it all. — belle, Mar 21 '19 at 21:41

score 4 · Accepted Answer · answered Mar 21 '19 at 19:01

You need to read the full error message from wikidata. The key lines are here --

java.util.concurrent.ExecutionException: org.openrdf.query.MalformedQueryException: Bad aggregate
...
Caused by: org.openrdf.query.MalformedQueryException: Bad aggregate
...
Caused by: com.bigdata.rdf.sail.sparql.ast.VisitorException: Bad aggregate
...
Caused by: java.lang.IllegalArgumentException: Non-aggregate variable in select expression: incident

Basically, all non-aggregate variables in your SELECT must also be in your GROUP BY. With some other tweaks that I think will benefit you, your query becomes something like this --

SELECT DISTINCT ?incident 
                ?incidentLabel 
                ?target
                ?targetLabel
                ( GROUP_CONCAT ( DISTINCT ?participantLabel; separator="; " ) AS ?participants )
WHERE
  {
               ?incident     wdt:P31     wd:Q132821 .
               ?incident     rdfs:label  ?incidentLabel .
               FILTER ( LANGMATCHES ( LANG ( ?incidentLabel ), "en" ) ) 
    OPTIONAL { ?incident     wdt:P710    ?participant .
               ?participant  rdfs:label  ?participantLabel 
               FILTER ( LANGMATCHES ( LANG ( ?participantLabel ), "en" ) ) 
             }
    OPTIONAL { ?incident     wdt:P533    ?target . 
               ?target       rdfs:label  ?targetLabel 
               FILTER ( LANGMATCHES ( LANG ( ?targetLabel ), "en" ) ) 
             }
  }
GROUP BY ?incident ?incidentLabel ?target ?targetLabel
ORDER BY ?incidentLabel ?targetLabel

I cannot explain the duplicate rows that appear in the result set (scroll down to "1991 Vic bombing"). These should have been eliminated by either or both the SELECT DISTINCT and the GROUP BY.

One of the rare cases where the `langmatches` function has some influence...there are 3 English labels for target: `en, en-ca, en-gb` - I checked with adding `(sample(lang(?targetLabel)) as ?langs) ` to the projection part. So, removing `?targetLabel` from the `group by` part and making use of the `sample` aggregate function on the `?targetLabel` does the trick here. — UninformedUser, Mar 21 '19 at 19:11
@TallTed nice formatting; helps illustrate the triples in a SPARQL query — Jay Gray, Mar 22 '19 at 07:05
@UninformedUser , can you please write out the full query that uses sample on `?targetLabel`? I'm struggling to make this work. — Blin, Jul 02 '22 at 10:41

Applying group_concat in wikidata queries for participants

1 Answers1