2

I have a query where I count with a grouping on years. For some of the years there is no items to count and thus no result. I would like to have SPARQL return a count of zero for such years.

I am using the Wikidata Query Service, https://query.wikidata.org, and my present solution is to make a series of values and a union with the actual query. It looks a bit clumsy to me. Is there a better way?

#defaultView:BarChart
select ?year ?number_of_pages ?work_label where {
  {
    select ?year (sample(?pages) as ?number_of_pages) ?work_label       
    where {
      {
        select * where {
          values (?year ?pages ?work_label) {
            ("2000" "0"^^xsd:integer "_") 
            ("2001" "0"^^xsd:integer "_") 
            ("2002" "0"^^xsd:integer "_") 
            ("2003" "0"^^xsd:integer "_") 
            ("2004" "0"^^xsd:integer "_") 
            ("2005" "0"^^xsd:integer "_") 
            ("2006" "0"^^xsd:integer "_") 
            ("2007" "0"^^xsd:integer "_") 
            ("2008" "0"^^xsd:integer "_") 
            ("2009" "0"^^xsd:integer "_") 
            ("2010" "0"^^xsd:integer "_") 
            ("2011" "0"^^xsd:integer "_")
            ("2012" "0"^^xsd:integer "_")
            ("2013" "0"^^xsd:integer "_")
            ("2014" "0"^^xsd:integer "_")
            ("2015" "0"^^xsd:integer "_")
            ("2016" "0"^^xsd:integer "_")
          }
        }
      }
      union {
        ?work wdt:P50 wd:Q18921408 .
        ?work wdt:P1104 ?pages .
        ?work wdt:P577 ?date . 
        ?work rdfs:label ?long_work_label . filter(lang(?long_work_label) = 'en')
        bind(substr(?long_work_label, 1, 20) as ?work_label)
        bind(str(year(?date)) as ?year) 
      }
    } 
    group by ?year ?work ?work_label
    order by ?year 
  }
}
logi-kal
  • 7,107
  • 6
  • 31
  • 43
Finn Årup Nielsen
  • 6,130
  • 1
  • 33
  • 43
  • I don't see the effect of grouping in your query. It returns indeed multiple values for the same year, e.g. `2012 12 The Pfam protein fam` and `2012 1 Bioimage informatics`. What do you want to achieve? – UninformedUser Nov 06 '16 at 23:07
  • In addition, SPARQL 1.1 supports `bind( if(condition,then,else) as ?result )`, see (https://www.w3.org/TR/sparql11-query/#func-if). Nevertheless, you will have to provide the years you're interested in. Once I understand what you want to achieve, I can maybe provide a solution – UninformedUser Nov 06 '16 at 23:11
  • What I would like is rows like (2011, 0, " _ "). If I remove all the VALUES rows, like ("2011" "0"^^xsd:integer " _ "), I do not get that. Instead of listing all the lines I was hoping I could do something like (range(2000, 2015), "0"^^xsd:integer " _ "). – Finn Årup Nielsen Nov 07 '16 at 09:16
  • To add some context: I use the plotting facility in the Wikidata Query Service. I have no ability to manipulate the SPARQL result before it is submitted to the plotting engine. The bar chart display method will not plot all the years if the zero count years are missing. – Finn Årup Nielsen Nov 07 '16 at 09:26

1 Answers1

6

You don't need so many nested queries. Here is a simple solution:

#defaultView:BarChart
select ?year ?pages ?label       
where {
  # iterating over years
  values ?year {
    "2000" "2001" "2002" "2003" "2004" "2005" 
    "2006" "2007" "2008" "2009" "2010" "2011" 
    "2012" "2013" "2014" "2015" "2016" 
  }

  # binding defaults
  bind( 0  as ?default_pages)
  bind("_" as ?default_label)

  # if there is a work in the given year, ?work_pages and ?work_label will be bound
  optional {
    ?work wdt:P50 wd:Q18921408;
          wdt:P1104 ?work_pages;
          wdt:P577  ?work_date. 
    bind(str(year(?work_date)) as ?year).

    ?work rdfs:label ?long_work_label. 
    filter(lang(?long_work_label) = 'en').
    bind(substr(?long_work_label, 1, 20) as ?work_label)
  }

  # either take ?work_pages/label value or default and bind it as the result ?pages/label
  bind(coalesce(?work_pages, ?default_pages) as ?pages)
  bind(coalesce(?work_label, ?default_label) as ?label)
} 
order by ?year

Here is results screenshot:

enter image description here


The key here is the combination of optional + bind/coalesce. The general pattern is

bind(... as ?default_foo)

optional { 
  # try to get value ?foo
}

bind(coalesce(?foo, ?default_foo) as ?result_foo)

coalesce returns the first value that it can (that is bound/evaluates without error). So if the value you tried to get in optional { ... } is not bound, the default one will be taken and bound as the result. A more verbose way to write it:

bind(if(bound(?foo), ?foo, ?default_foo) as ?result_foo)

But coalesce is better because you can pass several values in it. In a more complicated query it can be useful: see this example.

Community
  • 1
  • 1
laughedelic
  • 6,230
  • 1
  • 32
  • 41