Limit Graph DB responses per category

Question

I'm sure there's already an SO question asking the same thing, but I have been unable to find it. Perhaps I just don't know the right words to express this question properly. Some context: I'm deciding if AWS Neptune is a good fit for an upcoming project. So, I apparently have access to a SPARQL engine and a Tinkerpop/Gremlin engine if that helps answer the question.

Let's say I have items in a graph database. Those items have a "property" called category. Is there away to get a max of 20 items from each distinct category?

I've scoured various SPARQL resources (e.g. the docs) and haven't found anything describing this sort of functionality, and I've never had to do this in my previous sparql encounters. Not to familiar TinkerPop and Gremlin, but my initial readings haven't touched on this either.

I doubt this is possible in SPARQL, what you want is basically "n items per group" - even in SQL, this would need window function `row_number()` in combination with `partition by` - and I'm not aware of those features in SPARQL - clearly I might be wrong — UninformedUser, Mar 16 '20 at 20:28
SPARQL: https://web.archive.org/web/20150516154515/http://answers.semanticweb.com:80/questions/9842/how-to-limit-sparql-solution-group-size, https://stackoverflow.com/a/36554254/7879193 — Stanislav Kralin, Mar 16 '20 at 20:35
ok, using count on each item per category is funny, but it looks pretty much expensive — UninformedUser, Mar 17 '20 at 08:53

Kelvin Lawrence · Accepted Answer · 2020-03-17T13:01:01.803

It's fairly straightforward with Gremlin. Using the air-routes graph which has a region property for each airport. The following query will return five airports or less for California and Texas (there are more than 5 in the graph for each state).

gremlin> g.V().has('airport','region',within('US-CA','US-TX')).
               group().
                 by('region').
                 by(identity().limit(5).fold())

==>[US-TX:[v[3],v[8],v[11],v[33],v[38]],US-CA:[v[13],v[23],v[24],v[26],v[27]]]

EDITED: Added additional example where specific regions are not looked for.

gremlin> g.V().hasLabel('airport').
               limit(50).
               group().
                 by('region').
                 by(identity().limit(5).fold())

==>[US-FL:[v[9],v[15],v[16],v[19],v[25]],US-NV:[v[30]],US-HI:[v[37]],US-TX:[v[3],v[8],v[11],v[33],v[38]],US-WA:[v[22]],US-NY:[v[12],v[14],v[32],v[35]],US-NC:[v[21]],US-LA:[v[34]],GB-ENG:[v[49],v[50]],US-PA:[v[45]],US-DC:[v[7]],US-NM:[v[44]],US-AZ:[v[20],v[43]],US-TN:[v[4]],CA-BC:[v[48]],CA-ON:[v[47]],PR-U-A:[v[40]],US-MN:[v[17]],US-IL:[v[18]],US-AK:[v[2]],US-VA:[v[10]],US-CO:[v[31]],US-MD:[v[6]],US-IA:[v[36]],US-MA:[v[5]],US-CA:[v[13],v[23],v[24],v[26],v[27]],US-UT:[v[29]],US-OH:[v[41]],US-GA:[v[1]],US-MI:[v[46]]]

Okay, good to know it can be expressed decently well in Gremlin. Is it possible to do without knowing the regions ahead of time? — Jordan Shurmer, Mar 17 '20 at 12:37
Is it possible to do without knowing the regions ahead of time? — Jordan Shurmer, Mar 17 '20 at 12:37
Yes. I added a second example that gets 50 airports and applies the same grouping strategy without any knowledge of the regions. I used the explicit regions in the first example just to keep things simple. — Kelvin Lawrence, Mar 17 '20 at 13:02

Limit Graph DB responses per category

1 Answers1