3

I would like to create a Sparql query that contains two counts.

The query should get the 'neighbours of neighbours' of A (A → B → C, where A is the start node), and should report for each C, how many paths there were from A to C, and how many "inlinks" there are to C from anywhere. The result set should be as follow:

C | #C |  C_INLINKS
--------------------------
A | 2  | 123
B | 3  | 234

Where #C is the number of paths to C from starting node A.

I can create the counts separately, but I don't know how to combine these:

Count neighbours of neighbours:

select ?c count(?c) as ?countc WHERE {
   <http://dbpedia.org/resource/AFC_Ajax> ?p1 ?b.
   ?b ?p2 ?c.
   FILTER (regex(str(?c), '^http://dbpedia.org/resource/'))
}
GROUP BY ?c
ORDER BY DESC(?countc)
LIMIT 100

Count inlinks to neighbours of neigbours

select ?c count(?inlink) as ?inlinks WHERE {
   <http://dbpedia.org/resource/AFC_Ajax> ?p1 ?b.
   ?b ?p2 ?c.
   ?inlink ?p3 ?c
   FILTER (regex(str(?c), '^http://dbpedia.org/resource/'))
}
GROUP BY ?c
ORDER BY DESC(?inlinks)
LIMIT 100

Is it possible to combine these two queries? Thank you!

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
user1255553
  • 960
  • 2
  • 15
  • 27

1 Answers1

3

The counts you're trying to extract require you to group by different things. group by lets you specify what you're trying to count with respect to. E.g., when you say, select (count(?x) as ?xn) {...} group by ?y, you're saying "how many ?x's appear per each value of ?y. The counts you're looking for are: "how many C's per A" and then "how many inlinks per C"? That means that in one case you'd need to group by ?a and in the other, you'd need to group by ?c. However, in this case, since you've got a fixed ?a, this might be a little bit easier. To count the distinct paths (?p1,?p2) is a little bit tricky, since when you do count(distinct …), you can only have a single expression for . However, you can be sneaky by counting distinct concat(str(?p1),str(?p2)), which is a single expression, and should be unique for each ?p1 ?p2 pair. Then I think you'd be looking for a query like this:

select ?c
       (count(distinct concat(str(?p1),str(?b),str(?p2))) as ?n_paths)
       (count(distinct ?inlink) as ?n_inlink)
where {
  dbpedia:AFC_Ajax ?p1 ?b . ?b ?p2 ?c .
  ?inlink ?p ?c
  filter strstarts(str(?c),str(dbpedia:))
}
group by ?c

SPARQL results

c                                                           n_paths n_inlink
----------------------------------------------------------------------------
http://dbpedia.org/resource/AFC_Ajax                        32      540
http://dbpedia.org/resource/Category:AFC_Ajax_players       17      484
http://dbpedia.org/resource/Category:Living_people          17      659447
http://dbpedia.org/resource/Category:Eredivisie_players     13      2232
http://dbpedia.org/resource/Category:Dutch_footballers      12      2141
http://dbpedia.org/resource/Category:1994_births             6      3605
…
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • Hi, I have a feeling it still does not return the correct values. For example, using select distinct ?p1,?b,?p2 where { dbpedia:AFC_Ajax ?p1 ?b . ?b ?p2 } I find there are 57 paths between AFC_Ajax and Category:AFC_Ajax_players. But your query gives 17. I don't understand why these number are different. Do you know what is going wrong here? Thank you! – user1255553 Mar 02 '15 at 13:01
  • It seems that if I remove a "count()" the other count() values also changes, but that doesn't make sense right? Because the values are grouped on ?c. – user1255553 Mar 04 '15 at 20:45