2

I'm using sails 0.11.2. With the latest sails-mongo adapter. I have a very large database (gigabytes of data) of mainly timestamp and values. And i make queries on it using the blueprint api.

If I query using localhost:1337/datatable?limit=100000000000 the nodejs hangs on 0.12 with a lot of CPU usage, and crashes on v4. It crashes on the toJSON function.

I've finded out that i need to make multiple queries on my API. But I don't how to proceed to make it.

How can i make multiple queries that "don't explode" my server?


Update:

On newer version 0.12.3 with latest waterline and sails-mongo, the queries goes much smoother. The crashes on the cloud was that I didn't had enough RAM to handle sailsjs and mongodb on same T2.micro instance.

I've moved the mongodb server to a M3.Medium instance. And now the server don't crash anymore, but it freezes. I'm using skip limit and it works nicely for sails.js but for mongodb is a great waste of resources!

Mongodb make an internal query using limit = skip + limit. and then moves the cursor to the desired data and returns. When you are making a lot's in pagination you are using lots of internal queries. As the query size will increase.

Gustavo Garcia
  • 1,905
  • 1
  • 15
  • 27
  • I'm really interested in what you are trying to achieve. Can you please describe the exact problem you're solving - why and how you need to process the gigabytes of data? When it comes to large datasets, generally streaming is the way to go and ``sails-mongo`` seems to support - https://github.com/balderdashy/sails-mongo/blob/master/lib/adapter.js#L484 – tiblu Nov 22 '15 at 20:42
  • To make multiple queries through blueprint API use the `skip` parameter along with `limit` to get the data in a paginated way. ref: http://sailsjs.org/documentation/reference/blueprint-api/find-where – Moinul Hossain Nov 22 '15 at 20:44
  • I'm doing an Industrial IoT solution. I collect many digital inputs from machines and by analysing these data I can extract some indicators as number of items produced, availability and so. There are lots of data. And I need to analyse all of them, so i use two diferent servers, one for handling data and one for processing. I've tried both methods. Streaming and skip/limit on my analytics node! Using stream is enough to not crash! :D But using skip/limit stress a lot less the server. So it's working better! Using skip/limit and multiple queries is a nice solution. I'll try to use that! – Gustavo Garcia Nov 23 '15 at 21:10

2 Answers2

1

As this article explains, the way to get around the waste of resources in MongoDB is to avoid using skip and cleverly use _id as part of your query.

I did not use sails mongo but I did implement the idea above by using mongo driver in nodejs:

/**
 * Motivation:
 * Wanted to put together some code that used:
 *  - BlueBird (promises)
 *  - MongoDB NodeJS Driver
 *  - and paging that did not rely on skip()
 *
 * References:
 * Based on articles such as:
 * https://scalegrid.io/blog/fast-paging-with-mongodb/
 * and GitHub puclic code searches such as:
 * https://github.com/search?utf8=%E2%9C%93&q=bluebird+MongoClient+_id+find+limit+gt+language%3Ajavascript+&type=Code&ref=searchresults
 * which yielded smaple code hits such as:
 * https://github.com/HabitRPG/habitrpg/blob/28f2e9c356d7053884107d90d04e28dde75fa81b/migrations/api_v3/coupons.js#L71
 */

  var Promise = require('bluebird'); // jshint ignore:line
  var _ = require('lodash');
  var MongoClient = require('mongodb').MongoClient;
  var dbHandleForShutDowns;

  // option a: great for debugging
  var logger = require('tracer').console();
  // option b: general purpose use
  //var logger = console;

  //...

    var getPage = function getPage(db, collectionName, query, projection, pageSize, processPage) {
      //console.log('DEBUG', 'filter:', JSON.stringify(query,null,2));
      projection = (projection) ? projection['_id']=true : {'_id':true};
      return db
        .collection(collectionName)
        .find(query)
        .project(projection)
        .sort({'_id':1}).limit(pageSize)
        .toArray() // cursor methods return promises: http://mongodb.github.io/node-mongodb-native/2.1/api/Cursor.html#toArray
        .then(function processPagedResults(documents) {
          if (!documents || documents.length < 1) {
            // stop - no data left to traverse
            return Promise.resolve();
          }
          else {
            if (documents.length < pageSize) {
              // stop - last page
              return processPage(documents);
            }
            else {
              return processPage(documents) // process the results of the current page
                .then(function getNextPage(){ // then go get the next page
                  var last_id = documents[documents.length-1]['_id'];
                  query['_id'] = {'$gt' : last_id};
                  return getPage(db, collectionName, query, projection, pageSize, processPage);
                });
            }
          }
        });
    };

    //...

    return MongoClient
      .connect(params.dbUrl, {
        promiseLibrary: Promise
      })
      .then(function(db) {
        dbHandleForShutDowns = db;
        return getPage(db, collectionName, {}, {}, 5, function processPage(pagedDocs){console.log('do something with', pagedDocs);})
          .finally(db.close.bind(db));
      })
      .catch(function(err) {
        console.error("ERROR", err);
        dbHandleForShutDowns.close();
      });

The following two sections show how the code manipulates _id and makes it part of the query:

 .sort({'_id':1}).limit(pageSize)
 // [...]
var last_id = documents[documents.length-1]['_id'];
query['_id'] = {'$gt' : last_id};

Overall code flow:

  1. Let getPage() handle the work, you can set the pageSize and query to your liking:

    return getPage(db, collectionName, {}, {}, 5, function processPage(pagedDocs){console.log('do something with', pagedDocs);})
    
  2. Method signature:

    var getPage = function getPage(db, collectionName, query, projection, pageSize, processPage) {
    
  3. Process pagedResults as soon as they become available:

    return processPage(documents) // process the results of the current page
    
  4. Move on to the next page:

    return getPage(db, collectionName, query, projection, pageSize, processPage);
    
  5. The code will stop when there is no more data left:

    // stop - no data left to traverse
    return Promise.resolve();
    
  6. Or it will stop when working on the last page of data:

    // stop - last page
    return processPage(documents);
    

I hope this offers some inspiration, even if its not an exact solution for your needs.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
pulkitsinghal
  • 3,855
  • 13
  • 45
  • 84
1

1. run aggregate

const SailsMongoQuery = require('sails-mongo/lib/query/index.js')
const SailsMongoMatchMongoId = require('sails-mongo/lib/utils.js').matchMongoId
const fn = model.find(query).paginate(paginate)
const criteria = fn._criteria
const queryLib = new SailsMongoQuery(criteria, {})
const queryOptions = _.omit(queryLib.criteria, 'where')
const where = queryLib.criteria.where || {}
const queryWhere = Object.keys(where).reduce((acc, key) => {
  const val = where[key]
  acc[key] = SailsMongoMatchMongoId(val) ? new ObjectID(val) : val
  return acc
}, {})

const aggregate = [
  { $match: queryWhere }
].concat(Object.keys(queryOptions).map(key => ({ [`$${key}`]: queryOptions[key] })))

// console.log('roge aggregate --->', JSON.stringify(aggregate, null, 2))

model.native((err, collection) => {
  if (err) return callback(err)
  collection.aggregate(aggregate, { allowDiskUse: true }).toArray(function (err, docs) {
    if (err) return callback(err)
    const pk = primaryKey === 'id' ? '_id' : primaryKey
    ids = docs.reduce((acc, doc) => [...acc, doc[pk]], [])
    callback()
  })
})

2. run sails find by id`s

query = Object.assign({}, query, { [primaryKey]: ids }) // check primary key in sails model
fn = model.find(query) // .populate or another method
fn.exec((err, results) => { console.log('result ->>>>', err, results) })
Mert Köklü
  • 2,183
  • 2
  • 16
  • 20