3

I have a huge collection of documents and I want to extract some statistics on it. It needs to be executed periodically each 15 minutes.

Most of the stats are based on the document size, so I need to fetch the documents and calculate its size.

The output of my stats is just a single line with some stats, regarding the size of the documents. (I am not fetching a whole collection, just a subset of it, so I cannot use the collection stats provided by mongodb)

What I'd like is to make this execution on the server side, and avoid transferring all the documents to the client side, (just because I need to calculate the size).

I am executing it with mongo shell, making sure I am connecting to a secondary, and this mongo shell is always running in a remote machine, so this is the main reason to avoid transferring all documents through the network.

After reading the mongo shell documentation I expected it to be executed "server-side" as it states, but it is not working this way and it is being executed in the same machine as the mongo shell (which is more client-side than server-side in my opinion).

I am pasting an extract of my code just in case it helps :

db.cache.find(query).forEach(function(obj) {

        var curr = Object.bsonsize(obj); 

        if(stats.max < curr) {
            stats.max = curr;
            stats.maxid = obj._id;
        } 
        if(stats.min > curr) {
            stats.min = curr;
        } 
        stats.count++;
        stats.total += curr;

        stats.avg = stats.total/stats.count;
    })

It takes like 3-4 seconds if I execute mongo shell locally and more than 1 minute in a mongo shell executed remotely.

Any ideas how to make this server side javascript be a real server side execution?

UPDATE:

To summarize the options mentioned in the answer :

  • use system.js collection + db.eval : I cannot use it because eval is deprecated, but also eval needs to run on the master, and I have to run it on a secondary.

  • use system.js collection + loadServerScripts : It executes the javascript code in the mongo shell machine, which is the "client".

  • cron job : I'd need to run it on a specific node, and as master may change to another node, I can end up running it against the master which I should avoid. But also, I am not allowed to do so, one of the requirements is to run it on a remote shell. (There are several dbs like these one that will need this kind of statistics, and it is easier to mantain having it only in one place).

richardtz
  • 4,993
  • 2
  • 27
  • 38

1 Answers1

2

You could store js code as a kind of stored procedure.

As per this article you can store js as a system call:

 db.system.js.save({_id: "sum", value: function (x, y) { return x + y; }});

then call it like:

db.eval("return sum(2, 3);");

as eval is depreciated - there is no time set when it will be disabled see here

db.loadServerScripts();
sum(3,2) 

extra documentation here

other solution to eval is is to have a cron job calling a javascript file lunched locally on server

profesor79
  • 9,213
  • 3
  • 31
  • 52
  • thanks for the response. I get this warning : "WARNING: db.eval is deprecated" when using eval. Any other way to do it without using eval ? – richardtz May 18 '16 at 14:05
  • using loadServerScripts, it is executed in the mongo shell machine, so again I have the same probem as before :( – richardtz May 18 '16 at 14:42
  • @richardtz as you have access to mongo console - then you can issue : `db.loadServerScripts();` - am I right? or are you going to have a script that will be called with cron? – profesor79 May 18 '16 at 14:47
  • not sure what you mean by mongo console. I am using mongo shell to execute my script. This mongo shell is running on a remote machine. If I use eval, then my code is executed in the server -> fine but deprecated. If use loadServerScripts then my code is executed in remote machine not the server. – richardtz May 18 '16 at 14:59
  • @richardtz -other solution to `eval` is is to have a `cron job` calling a javascript file lunched locally on server – profesor79 May 18 '16 at 15:16
  • I also have problems with the cron approach, as I don't know which instance will be the primary and I need to execute it on the secondary. – richardtz May 19 '16 at 06:46
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/112397/discussion-between-richardtz-and-profesor79). – richardtz May 19 '16 at 12:58