1

I want to return a table where each row is a distinct toy item and there are columns for each toy's image and its sitelink count.

Q: Is there a better way to do this than what I finally did below? Why did I have to move labeling and sitelinks to the inner query?

Initially, I naively thought I could run the following query. But I discovered it created one row for each toy-image pair (I suppose it would return what I want if every image property had a priority-ranked image?). E.g., "gumball machine" (wd:Q1737075) has two rows, one for each of its two images.

SELECT ?item ?itemLabel ?image ?sitelinks WHERE {
  ?item wdt:P31 wd:Q11422; #toy, returns
        wdt:P18 ?image;
        wikibase:sitelinks ?sitelinks.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?sitelinks)

Run it!

So then I ran the following query, which gives me what I want.

SELECT ?item ?itemLabel ?sitelinks ?image WHERE {
  
  {
    SELECT ?item ?itemLabel ?sitelinks (MAX(?_image) AS ?image) WHERE {
      ?item wdt:P31 wd:Q11422; #toys
            wikibase:sitelinks ?sitelinks;
            rdfs:label ?itemLabel;
            wdt:P18 ?_image.
      FILTER(LANG(?itemLabel)="en")
    }
    GROUP BY ?item ?itemLabel ?sitelinks
  }
  
  ?item wdt:P18 ?image.
        #rdfs:label ?itemLabel;
        #wikibase:sitelinks ?sitelinks
  #FILTER(LANG(?itemLabel)="en")
}
ORDER BY DESC(?sitelinks)

Run it!

But Is this right? Do I really need to nest queries in order to get one image per item?

Also, you can see from the commented lines that I initially tried running this with the labelling and sitelinks in the outer query. But that led to query timeouts. Why? Shouldn't that have been the more efficient construction, saving the labelling/sitelink work to the end where I have a smaller dataset after the inner query work?

lowndrul
  • 3,715
  • 7
  • 36
  • 54
  • 1
    SPARQL has no concept of *correlated queries* - the only way to get at most one image per item is to `group by` and use e.g. the `sample` function to get a random image of possibly multiple existing image bindings. Regarding using a subquery first, no that would not help, there can still be multiple image bindings – UninformedUser Oct 02 '22 at 15:21
  • regarding using label and sitelink property to reduce the result, I doubt that a label is a discriminative property in Wikidata, I would assume that each item contains plenty of labels in many languages, thus, it makes the query more expensive as it a) has to join on all the labels and b) apply a filter on the language tag then – UninformedUser Oct 02 '22 at 15:24
  • 2
    I also don't get what the outer query is supposed to do in your last query, the subquery already returns the image. So what's the point to force another join on the image URL? Just run the subquery and you're done and should get the result you want – UninformedUser Oct 02 '22 at 15:26
  • 1
    Never knew about `SAMPLE`. Rewrote my query using it. Everything works. [Run it!](https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3Fsitelinks%20%28SAMPLE%28%3F_image%29%20AS%20%3Fimage%29%20%0AWHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ11422%3B%20%23toys%0A%20%20%20%20%20%20%20%20wikibase%3Asitelinks%20%3Fsitelinks%3B%0A%20%20%20%20%20%20%20%20rdfs%3Alabel%20%3FitemLabel%3B%0A%20%20%20%20%20%20%20%20wdt%3AP18%20%3F_image.%0A%20%20FILTER%28LANG%28%3FitemLabel%29%3D%22en%22%29%0A%7D%0AGROUP%20BY%20%3Fitem%20%3FitemLabel%20%3Fsitelinks%0AORDER%20BY%20DESC%28%3Fsitelinks%29) – lowndrul Oct 02 '22 at 15:33
  • You're right about the nonsense of my outer query. I think I carried that over from when I was using `STR()` to match strings, not knowing you didn't need to do such a thing. Just kind of lost my way there! – lowndrul Oct 02 '22 at 15:35
  • I understand that, SPARQL can be confusing in the beginning, and then incrementally creating and debugging a SPARQL query sometimes leads to redundant or superflous stuff, I know this a lot. If you got your query working, feel free to post it as an answer and also accept your own answer to mark it as done. – UninformedUser Oct 02 '22 at 16:51

1 Answers1

1

In the second query, there is no need to wrap the inner portion with the outer portion. So the following works just fine

SELECT ?item ?itemLabel ?sitelinks (MAX(?_image) AS ?image) 
WHERE {
  ?item wdt:P31 wd:Q11422;
        wikibase:sitelinks ?sitelinks;
        rdfs:label ?itemLabel;
        wdt:P18 ?_image.
  FILTER(LANG(?itemLabel)="en")
}
GROUP BY ?item ?itemLabel ?sitelinks
ORDER BY DESC(?sitelinks)

Run it!

And if you don't need a specific image or to maintain replicability, just use SAMPLE to avoid the extra operations with MAX:

SELECT ?item ?itemLabel ?sitelinks (SAMPLE(?_image) AS ?image) 
WHERE {
  ?item wdt:P31 wd:Q11422;
        wikibase:sitelinks ?sitelinks;
        rdfs:label ?itemLabel;
        wdt:P18 ?_image.
  FILTER(LANG(?itemLabel)="en")
}
GROUP BY ?item ?itemLabel ?sitelinks
ORDER BY DESC(?sitelinks)

Run it!

lowndrul
  • 3,715
  • 7
  • 36
  • 54