0

I have a use case where I have to remove a subset of entities stored in couchbase, e.g. removing all entities with keys starting with "pii_". I am using NodeJS SDK but there is only one remove method which takes one key at a time: http://docs.couchbase.com/sdk-api/couchbase-node-client-2.0.0/Bucket.html#remove

In some cases thousands of entities need to be deleted and it takes very long time if I delete them one by one especially because I don't keep list of keys in my application.

ThinkFloyd
  • 4,981
  • 6
  • 36
  • 56

2 Answers2

2

I agree with the @ThinkFloyd when he saying: Delete on server should be delete on server, rather than requiring three steps like get data from server, iterate over it on client side and finally for each record fire delete on the server again.

In this regards, I think old fashioned RDBMS were better all you need to do is 'DELETE * from database where something=something'.

Fortunately, there is something similar to SQL is available in CouchBase called N1QL (pronounced nickle). I am not aware about JavaScript (and other language syntax) but this is how I did it in python.

Query to be used: DELETE from <bucketname> b where META(b).id LIKE "%"

    layer_name_prefix = cb_layer_key + "|" + "%"
    query = ""
    try:
        query = N1QLQuery('DELETE from `test-feature` b where META(b).id LIKE $1', layer_name_prefix)
        cb.n1ql_query(query).execute()
    except CouchbaseError, e:
        logger.exception(e)

To achieve the same thing: alternate query could be as below if you are storing 'type' and/or other meta data like 'parent_id'.

DELETE from <bucket_name> where type='Feature' and parent_id=8;

But I prefer to use first version of the query as it operates on key, and I believe Couchbase must have some internal indexes to operate/query faster on key (and other metadata).

Jadav Bheda
  • 5,031
  • 1
  • 30
  • 28
1

The best way to accomplish this is to create a Couchbase view by key and then range query over that view via your NodeJS code, making deletes on the results.

For example, your Couchbase view could look like the following:

function(doc, meta) {
    emit(meta.id, null);
}

Then in your NodeJS code, you could have something that looks like this:

var couchbase = require('couchbase');
var ViewQuery = couchbase.ViewQuery;

var query = ViewQuery.from('designdoc', 'by_id');

query.range("pii_", "pii_" + "\u0000", false);

var myBucket = myCluster.openBucket();
myBucket.query(query, function(err, results) {
    for(i in results) {
        // Delete code in here
    }
});

Of course your Couchbase design document and view will be named differently than the example that I gave, but the important part is the ViewQuery.range function that was used.

All document ids prefixed with pii_ would be returned, in which case you can loop over them and start deleting.

Best,

Nic Raboy
  • 3,143
  • 24
  • 26
  • This means we get keys in bulk but again we are deleting all the entities one-by-one using a loop. If I have 10000+ entities then this for-loop will bombard couchbase server with those many parallel requests instead of sending all keys in a list and let couchbase delete all those internally. Won't these many requests affect other requests which are being fired by application for other use cases? Can't I DELETE using views? – ThinkFloyd Jun 17 '15 at 05:04
  • Hey @ThinkFloyd, You have a valid concern, however, with Couchbase Server operating as efficiently as it does and NodeJS being asynchronous, this isn't really an issue. Every delete you issue in NodeJS will be non-blocking so the application layer won't lock up. When a delete request hits Couchbase Server, the document is then marked for deletion and is then later deleted when compaction happens. You are not able to delete data via a view directly. I wouldn't be too worried on this, but let me know if you have further questions. Best, – Nic Raboy Jun 17 '15 at 14:26
  • is it possible to remove all the entities in a bucket? Something like `TRUNCATE Table XYZ`. If that's possible then I can club all my keys in one bucket and purge it whenever required. – ThinkFloyd Jun 17 '15 at 16:34
  • You could use the `flush` function if you have it enabled via the settings of your bucket, but it should be used with caution. Any reason you don't want to do a standard delete on each document? http://docs.couchbase.com/sdk-api/couchbase-node-client-2.0.8/BucketManager.html – Nic Raboy Jun 17 '15 at 17:47
  • Another option is do Nic's code, but instead of delete, just set a random TTL for each object so it expires over the next few days using the touch() method and let Couchbase take care of actually deleting the objects. Then you are just touching the object's meta data and this would be very fast. It also would not really affect CB's performance as it would stagger the deletes over the time period. http://docs.couchbase.com/sdk-api/couchbase-node-client-2.0.0/Bucket.html#touch – NoSQLKnowHow Jun 18 '15 at 00:03
  • I actually want to reuse the keys. After invalidating/deleting all entities slowly slowly latest values will be populated and used in future. I cannot set random TTL in this case otherwise my application will get outdated values. – ThinkFloyd Jun 19 '15 at 11:30