Wikidata query duplicates

Question

Sorry if my english is bad, but I don't really have any place where I can ask this question in my native language. I've been trying to create SPARQL query for Wikidata that should create a list of all horror fiction that was created in 1925-1950 years, names of authors and, if available, pictures:

SELECT DISTINCT ?item ?itemLabel ?author ?name ?creation ?picture
WHERE
{
    ?item wdt:P136 wd:Q193606 . # book
    ?item wdt:P50 ?author .   # author
    ?item wdt:P577 ?creation .
    ?item wdt:P577 ?end .
  ?author rdfs:label ?name .    
  OPTIONAL{ ?item wdt:P18 ?picture }
  FILTER (?creation >= "1925-01-01T00:00:00Z"^^xsd:dateTime) .
  FILTER (?end <= "1950-12-31T23:59:59Z"^^xsd:dateTime) .

SERVICE wikibase:label
{ 
bd:serviceParam wikibase:language "en" .
} 
}

However, for some reason this query placing duplicates in the list. DISTINCT doesn't do much. After some time I figured out that the reason is "?item rdfs:label ?name .". If this line is removed, no duplicates are listed. But I need this line to show author name in the list! Any ideas on how to fix this?

score 2 · Answer 1 · answered Dec 29 '16 at 11:07

2

You don't need to use ?item rdfs:label ?name . as you already get items labels as ?itemLabel thank to SERVICE wikibase:label.

Then, you will get duplicate results for every items that have a SELECTed property with possibly multiple values: here, you are SELECTing authors (P50), which will create duplicates for every item with several authors.

answered Dec 29 '16 at 11:07

maxlath

1,804
15
24

1

Uh-oh, my mistake. This line should be `?author rdfs:label ?name .`Without it, I don't get a name of an author, I get just a link. What should I do to make it work properly? – Dusk Dec 29 '16 at 11:23
then you should just need to `SELECT ?authorLabel` too, I guess – maxlath Dec 29 '16 at 12:54
It worked! So simple, yet I don't think I would have figured that out by myself. Thank you very much! – Dusk Dec 29 '16 at 12:59
1

more on the label service: https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Label_service – maxlath Dec 29 '16 at 15:21

score 2 · Answer 2 · answered Dec 29 '16 at 11:19

2

The query is actually giving you distinct items. The problem is that some items have multiple rdfs:labels. You can see as an example the item:

SELECT *
WHERE
{
   wd:Q2882840 rdfs:label ?label

SERVICE wikibase:label
{ 
bd:serviceParam wikibase:language "en" .
} 
}

And since there are multiple rdfs:label predicates for some items, they are showing up in separate rows.

answered Dec 29 '16 at 11:19

Kiran.B

225
2
9

So, I guess, there's no way to make this query work properly? – Dusk Dec 29 '16 at 11:33
If there are multiple labels you get multiple rows back - that's how the semantics of SPARQL is defined. You could i) restrict the number of labels or ii) use `GROUP_CONCAT` to combine all labels into a single value. – UninformedUser Dec 29 '16 at 12:03
I tried `{ select * where {?author rdfs:label ?name.} LIMIT 1}` instead of the line `?author rdfs:label ?name` but it does not seem to work (returns back no results); not sure why. – Kiran.B Dec 29 '16 at 12:18
Found some examples of using `GROUP_CONCAT` via google but it won't work with my query, saying "Bad aggregate". Idea with `LIMIT` actually sounds nice. If only it would work, sigh. – Dusk Dec 29 '16 at 12:23
@Dusk See my answer below – Shlomi Uziel Jan 11 '17 at 16:48

score 2 · Answer 3 · answered Jan 11 '17 at 15:17

You can aggregate your results according to the book title (the item's label) using the

group by

keyword. Thus, every result will be a group which will show up once, and other fields which have different values, will be aggregated using the separator (in this case, a comma).

The fixed query:

SELECT DISTINCT ?item ?itemLabel 
(group_concat(distinct ?author;separator=",") as ?author)
(group_concat(distinct ?name;separator=",") as ?name)
 (group_concat(distinct ?creation;separator=",") as ?creation)
 (group_concat(distinct ?picture;separator=",") as ?picture)
WHERE
{
    ?item wdt:P136 wd:Q193606 . # book
    ?item wdt:P50 ?author .   # author
    ?item wdt:P577 ?creation .
    ?item wdt:P577 ?end .
  ?author rdfs:label ?name .    
  OPTIONAL{ ?item wdt:P18 ?picture }
  FILTER (?creation >= "1925-01-01T00:00:00Z"^^xsd:dateTime) .
  FILTER (?end <= "1950-12-31T23:59:59Z"^^xsd:dateTime) .

SERVICE wikibase:label
{ 
bd:serviceParam wikibase:language "en" .
} 
}
group by ?item ?itemLabel

nice! thanks - this is on perfect way to handle duplicates, I got those often for instance when getting a list of personalities filtered by some property and they have more than one image in WikiData — David Batista, Oct 24 '22 at 20:56

Wikidata query duplicates

3 Answers3

Linked