0

Given the following document structure:

{
    "id": "123"
    "traits":
     {
        "abc": 6.5
        "def: 66
     }
}

I need to iterate over the documents and remove some of the traits based on criteria. A document with all traits removed should be removed as well.

Finally, I need to keep track of how many traits and documents were removed.

The update and removal operations should not be blocking and may be executed while these documents are being updated or queried.

I have implemented this in python using python-arango by using an update/replace query to remove traits and a remove query to remove documents without traits by executing the following queries:

FOR some_doc IN some_collection
    FILTER <some filter>
    LET updated_doc = ...
    REPLACE some_doc with updated_doc in some_collection OPTIONS { ignoreRevs: false, ignoreErrors: true }
FOR some_doc IN some_collection
    FILTER LENGTH(some_doc.traits)==0
    REMOVE some_doc in some_collection OPTIONS { ignoreRevs: false, ignoreErrors: true }

I then pull statistics from each returned cursor:

cursor = db.aql.execute(remove_traits_query)
stats = cursor.statistics()
modified = stats['modified']

The problem is I need to prevent the possibility that a lookup query initiated during the execution of the above process returns a document with an empty traits object, before the 2nd query (i.e. remove query) is complete.

I tried implementing a transaction then pulling job cursor stats post commit like this:

trx_db = db.begin_transaction(write=collection)
traits_removal_job = trx_db.aql.execute(remove_traits_query)
doc_deletion_job = trx_db.aql.execute(delete_query)
trx_db.commit()
stats = traits_removal_job.result().statistics()

but the cursors of the transaction jobs are empty. I suppose that's because ArangoDB executes transaction as a single Javascript function.

I could filter out empty traits on all lookup queries, but it would be better if I could execute the above removal/update operations either in a single query (impossible in ArangoDB per documentation), or in a transaction (no execution stats?).

Any suggestions?

Thanks in advance!

Bruce S.
  • 106
  • 1
  • 3
  • Since you appear to do a complete replace in your first statement, it appears that you would already have the new document in hand before you start anything in the DB. As such, can't you check the new document to see if it has any traits and remove it directly rather than updating it first and then removing it? It is a bit more work in the app server but you do not have to worry about transactions – camba1 Jun 19 '19 at 22:55
  • the replace command does update the DB and cannot be followed by a remove in the same query. if I understand you correctly, what you suggest is do a RETURN instead of REPLACE in the first query then in the app server figure out which document requires UPDATE and which requires REMOVE, then run REMOVE and UPDATE in bulk. I would need to keep track of the revision id of each document and filter by them on updates/removals which possibly takes the bulk update option out of the equation. I'll look into it. – Bruce S. Jun 20 '19 at 14:45

0 Answers0