4

I need help with an ArangoDB AQL query. I have a transaction detail collection (EventTran) that logs update details on its parent table (Event). EventTran attributes include a timestamp, and a reference to the parent _id_event. I’m trying to work out a query to return an array of only the first and last (by timestamp) EventTran documents for a specified id_event. Here's an example:

FOR event IN EventTran
    FILTER event._id_event == "Event/167697"
    SORT event.timestamp DESC
    RETURN event

Might return:

[
  {
    "_key": "214092",
    "_id": "EventTran/214092",
    "_id_event": "Event/167697",
    "timestamp": 1511202637
  },
  {
    "_key": "213958",
    "_id": "EventTran/213958",
    "_id_event": "Event/167697",
    "timestamp": 1511202542
  },
  {
    "_key": "191809",
    "_id": "EventTran/191809",
    "_id_event": "Event/167697",
    "timestamp": 1511118705
  },
  {
    "_key": "167701",
    "_id": "EventTran/167701",
    "_id_event": "Event/167697",
    "timestamp": 1510965562
  }
]

I want a query that will return an array with only the first and last items, i.e. the very first log entry, and the most recent log entry:

[
  {
    "_key": "214092",
    "_id": "EventTran/214092",
    "_id_event": "Event/167697",
    "timestamp": 1511202637
  },
  {
    "_key": "167701",
    "_id": "EventTran/167701",
    "_id_event": "Event/167697",
    "timestamp": 1510965562
  }
]
peak
  • 105,803
  • 17
  • 152
  • 177
James Haskell
  • 1,545
  • 2
  • 16
  • 21

2 Answers2

4

The following is undoubtedly not the best possible solution in all circumstances, but it does avoid SORT, which is probably the last thing you want to do unless the collection is quite small.

The idea is very simple: determine the min and max values, and then collect the minimal items and select one of them, and likewise for the maximal items.

LET mnmx = (
  FOR x in EventTran
  FILTER event._id_event == "Event/167697"
  COLLECT AGGREGATE mn = MIN(x.timestamp), mx = MAX(x.timestamp)
  RETURN {mn,mx} )

LET mn = mnmx.mn
LET mx = mnmx.mx

LET least = (
  FOR x in EventTran
  FILTER x.timestamp == mn
  COLLECT y=x INTO minimal
  RETURN minimal[0] )

LET greatest = (
  FOR x in EventTran
  FILTER x.timestamp == mx
  COLLECT y=x INTO maximal
  RETURN maximal[0] )

RETURN {least, greatest}

The last line is an abbreviation for {"least": least, "greatest": greatest}, least and greatest being the items with the least and greatest timestamps.

peak
  • 105,803
  • 17
  • 152
  • 177
  • 1
    While this query can not utilize a skiplist index on `_id_event,timestamp` for the first query shown by @stj, it is a interesting demonstration of what can be done with `COLLECT AGGREGATE`. – CodeManX Nov 29 '17 at 18:21
3

There are two possible solutions:

1) the first one is to execute two queries and return just the top/bottom document from each:

RETURN [
  (FOR event IN EventTran  
     FILTER event._id_event == "Event/167697"
     SORT event.timestamp DESC
     LIMIT 1
     RETURN event
  )[0],
  (FOR event IN EventTran  
     FILTER event._id_event == "Event/167697"
     SORT event.timestamp ASC
     LIMIT 1
     RETURN event
  )[0]
]

As you can see, one query is using sort order DESC and the other is using sort order ASC. Just one document is returned from each query, and only that document is returned per query. If there are no matching documents for the specified _id_event value, the queries will return null each. So the overall result will become [null, null].

2) the alternative is to do everything is a single query and return just the first and last documents from it:

LET results = (
  FOR event IN EventTran  
     FILTER event._id_event == "Event/167697"
     SORT event.timestamp ASC
     RETURN event
)
RETURN [
  results[0],
  results[-1]
]

This however will all the documents for the associated event (may be many?) so the two-query solution may be more efficient.

stj
  • 9,037
  • 19
  • 33