I want to return a table where each row is a distinct toy item and there are columns for each toy's image and its sitelink count.
Q: Is there a better way to do this than what I finally did below? Why did I have to move labeling and sitelinks to the inner query?
Initially, I naively thought I could run the following query. But I discovered it created one row for each toy-image pair (I suppose it would return what I want if every image property had a priority-ranked image?). E.g., "gumball machine" (wd:Q1737075) has two rows, one for each of its two images.
SELECT ?item ?itemLabel ?image ?sitelinks WHERE {
?item wdt:P31 wd:Q11422; #toy, returns
wdt:P18 ?image;
wikibase:sitelinks ?sitelinks.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?sitelinks)
So then I ran the following query, which gives me what I want.
SELECT ?item ?itemLabel ?sitelinks ?image WHERE {
{
SELECT ?item ?itemLabel ?sitelinks (MAX(?_image) AS ?image) WHERE {
?item wdt:P31 wd:Q11422; #toys
wikibase:sitelinks ?sitelinks;
rdfs:label ?itemLabel;
wdt:P18 ?_image.
FILTER(LANG(?itemLabel)="en")
}
GROUP BY ?item ?itemLabel ?sitelinks
}
?item wdt:P18 ?image.
#rdfs:label ?itemLabel;
#wikibase:sitelinks ?sitelinks
#FILTER(LANG(?itemLabel)="en")
}
ORDER BY DESC(?sitelinks)
But Is this right? Do I really need to nest queries in order to get one image per item?
Also, you can see from the commented lines that I initially tried running this with the labelling and sitelinks in the outer query. But that led to query timeouts. Why? Shouldn't that have been the more efficient construction, saving the labelling/sitelink work to the end where I have a smaller dataset after the inner query work?