3

Say in an RDF dataset, there are a set of values that range from 0 to 100 (for example, percentages). I want to count the number of values in a given range, for example, 100 - 90 | 90 - 80 | ... | 10 - 0. The output I expect looks like the following:

╔════════════════╦════════╗
║     Range      ║ Count  ║
╠════════════════╬════════╣
║ 100 >= x < 90  ║ 4521   ║
║ 90 >= x < 80   ║ 650    ║
║     ...        ║ ...    ║
║ 10 >= x <= 0   ║ 2650   ║
╚════════════════╩════════╝

I am currently using SPARQL subqueries and filters to get to the solution. But this seems a common use case and my intuition tells me that there should be a better way to do this. Is there a better (or more efficient) way to reach this answer?

What current solution looks like the following.

PREFIX dqv: <http://www.w3.org/ns/dqv#>
select distinct ?count90_100 ?count80_90 ?count10_0  where  {
 ?m a dqv:QualityMeasurement .
 { select count(?m) as ?count90_100 where { ?m dqv:value ?value FILTER (?value > 90 && ?value <= 100) }}
 { select count(?m) as ?count80_90 where { ?m dqv:value ?value FILTER (?value > 80 && ?value <= 90) }}
 { select count(?m) as ?count10_0 where { ?m dqv:value ?value FILTER (?value >= 0 && ?value <= 10) }}
}
Nandana
  • 1,240
  • 8
  • 17

1 Answers1

8

You could use a values block to specify the upper and lower bounds on the ranges and to get an "id" for each range "id". Then you can group on that range. E.g.,

select ?rangeId (count(?x) as ?numMatches) {
  values (?rangeId ?min ?max) { (0 0 10)
                               (1 10 20)
                               #-- ...
                               (8 80 90)
                               (9 90 100) }

  #-- query that finds a value for ?x...

  filter (?min <= ?x && ?x < ?max)
}
group by ?rangeId
RobV
  • 28,022
  • 11
  • 77
  • 119
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353