ArangoDB Getting the Index of a Document

Question

Is there a way to get the index of a particular document in ArangoDB so as to use it in a LIMIT operation? For instance, let's say I look up a Document by it's ID and find that it's index is 534, then I would use that like this:

LIMIT 534, 30

...and return the next 30 documents after the document I looked up. I can't however seem to find a way to get the index of a document using AQL...

score 1 · Accepted Answer · answered Nov 20 '15 at 09:20

1

The question is what next documents should actually mean.

If the next here means document key index order:

A document has no particular "order" or "index" inside a collection. Documents in a collection are organized by their _key / _id attribute in the unsorted primary index.

To get to the next document key (let's assume lexicographically sorted keys) one would have to read all keys of the collection, sort them and somehow find the index of the current document to get to the next key. This would be terribly inefficient.

Other indexes on a collection are optional so one cannot rely on them being present and being usable for such query.

If next here means insertion or update order, then there is also no sensible way to get to the next documents.

A workaround may be to use a sorted (i.e. skiplist) index on some document attribute, ideally a unique one, and populate it whenever whenever a document is inserted (and maybe updated if updates should also change the order of a document).

Then to find the next documents, do this:

first find the desired document by its _id or _key and fetch the document data into the application
fetch the document attribute that has the sorted index on it, and use it in a follow-up AQL query as follows

This will allow you to find the documents following the original one, however, you have to maintain the order attribute somehow:

FOR doc IN collection
  FILTER doc.`order` > @value
  LIMIT 0, 30
  RETURN doc

This will be easy to do if your documents have some attribute that can be used for ordering anyway, but it will be a clumsy solution if they don't have such attribute.

answered Nov 20 '15 at 09:20

stj

9,037
19
33

The problem is I'm sorting by different, not always unique, attributes. For instance sorting by a client name. The goal is to load the first 30 records based on a client name, let's say "ABC", then load the next set. The caveat is, the data in Arango is edit by multiple users at once and the app is real time, so my original queried records may change by the time I go to request my next 'chunk'. The other problem is there may be more than 30 "ABC" records, so I can't use FILTER doc.cName > @value, because it may skip some "ABC" records... Any thoughts? – skinneejoe Nov 20 '15 at 13:40
You can avoid skipping records when there are many duplicates if you add a another FILTER condition on a unique attribute (e.g. `_key`): `FILTER doc.cName > @value && doc._key > @lastKey`. This should work even when there are many records that match `cName == "ABC`". – stj Nov 23 '15 at 09:38
Thinking about this further, would this work if you were sorting by the non-unique value? So imagine, sorting by clientName, filtering like you point out: FILTER doc.clientName > "@value" && doc._key > "@lastKey". Keys aren't necessarily added in order of client names, so wouldn't this filter exclude a doc whose name is alphanumerically greater than "@value", but whose key is not greater than "@lastKey"? – skinneejoe Nov 23 '15 at 17:00
Thinking about it more... I suppose you could sort by clientName then sort by _key, then use the filter as mentioned above and it should work. – skinneejoe Nov 23 '15 at 17:39
Going one step back: what is supposed to happen when new records are added after the initial search (with LIMIT 0, x) has taken place? Should they show up in follow-up searches (with LIMIT y, x), and if, where? Do users expect their searches & pagination to work as "snapshots"? If yes, then what could work is to include a timestamp in each document (populated on insertion) and use that to filter out newer documents in follow-up searches. This would also prevent "new" documents from showing up in the middle of a search and from caring about them in filter conditions. – stj Nov 24 '15 at 14:35
Well it's tricky... because the app is realtime, records can jump around based on when they were modified. So ideally it would figure out what the last record loaded from the server is, ensure it has not been updated, thus moving to the top of the list, and then query for any records following it. Honestly for now I've moved to a date range, paginating approach because of the complexities. But I will come back to this eventually when I have more time... – skinneejoe Nov 25 '15 at 13:55
You know as I typed my last comment it struck me, that maybe a date range is the easiest way to do this... I don't really care about getting a solid 30 records every time. I could just query the server for records between two date ranges. Then each time the next load is triggered, just move the date range back further. I suppose it's not the perfect solution, but it would probably suffice. I do welcome comments on how to make it work the other way though. Thanks! – skinneejoe Nov 25 '15 at 13:58

score 0 · Answer 2 · answered Nov 20 '15 at 08:52

If you don't apply anny sort filter, ArangoDB doesn't warant a special sequence.

So, doing

db._create("testCollection");
for (var i = 0; i < 10; i ++) db.testCollection.save({which: i})
db._query("FOR i IN testCollection RETURN i.which").toArray()

will give you a rather random sequence.

db._query("FOR i IN testCollection SORT i.which RETURN i.which").toArray()

will give you a well formed result.

You then can do:

db._query("FOR i IN testCollection SORT i.which LIMIT 3, 5 RETURN i.which").toArray()

to work with limits.

You also can however also use cursors to fetch partial results.

ArangoDB Getting the Index of a Document

2 Answers2