0

On my Ubuntu server I have a Ruby on Rails app that relies on MongoDB. I often use Mongoid to inject objects into the DB, but when injecting large amounts of object I compile a huge array of hashes and inject it with the mongo Shell method db.collection.insert():

ObjectName.collection.insert([{_id: BSON::ObjectId('5671329e4368725951010000'), name: "foo"}, {_id: BSON::ObjectId('567132c94368725951020000'), name: "bar"}])

The batch insertion time is a bottleneck for me. For example it takes 23 seconds to batch insert 150000 objects. Is it possible to allocate resources in a way that makes batch insertion faster?

Cjoerg
  • 1,271
  • 3
  • 21
  • 63
  • Let's do some math: 23 secs / 150000 docs equals 0,153 milliseconds or 153,3 µ seconds per document. That's not exactly slow in my book. Did a quick loop insert of 10000 docs into MySQL and it took 3,45 s or 345 µ seconds per doc – more than twice as long. – Markus W Mahlberg Dec 16 '15 at 11:43
  • It's not fast enough for me :-) Wether it's fast or not is not I'm interested in figuring out what determines the speed. If there is such a thing as slow and fast batch injection, then that means that I can affect it :-) – Cjoerg Dec 16 '15 at 11:47
  • No such thing as batch injection in MongoDB. There are bulk operations, which may be any of inserts, updates, removals and even retrieval. Anything else is an abstraction layer, which surely doesn't speed up the process. Especially not with a dynamically typed language. And especially not when allocating loads of RAM for storing 150k of hashes. Bottom line, since putting it polite obviously did not do it: Most likely it is not MongoDB being slow, but your code. Easy to check: parallelize it, using multiple connections. If it doesn't get faster, it is either your code or your disks. – Markus W Mahlberg Dec 16 '15 at 13:28
  • No need to be polite. The last comment is definitely something I can work with! Thanks, I will try to look at my issue from this approach. – Cjoerg Dec 16 '15 at 13:47

1 Answers1

0

You can try by using mongoid gem

batch = [{_id: BSON::ObjectId('5671329e4368725951010000'), name: "foo"}, {_id: BSON::ObjectId('567132c94368725951020000'), name: "bar"}]

Post.collection.insert(batch) #lest Post is the model

or you can do by Ruby MongoDb Driver

require 'mongo'
mongo_client = Mongo::MongoClient.new
coll = mongo_client['test_db']['test_collection']
bulk = coll.initialize_ordered_bulk_op
batch.each do |hash|
  bulk.insert(hash)
end
bulk.execute

and if you want it by mongo query by same way. You can follow Bulk Insert

For Increase the data you can use sharding and

Sharding is the process of storing data records across multiple machines and is MongoDB’s approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. With sharding, you add more machines to support data growth and the demands of read and write operations.

Different scaling

Vertical scaling adds more CPU and storage resources to increase capacity. Scaling by adding capacity has limitations: high performance systems with large numbers of CPUs and large amount of RAM are disproportionately more expensive than smaller systems. Additionally, cloud-based providers may only allow users to provision smaller instances. As a result there is a practical maximum capability for vertical scaling. Sharding, or horizontal scaling, by contrast, divides the data set and distributes the data over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database.

Rajarshi Das
  • 11,778
  • 6
  • 46
  • 74
  • Thanks for your answer. But I don't think it's what I am looking for. Whatever needs to be done should probably have to do with increasing capacity of something on the server level. – Cjoerg Dec 16 '15 at 11:31
  • You need to do sharding to store large data on different machines/nodes make a replica set of primary others will be secondary for read – Rajarshi Das Dec 16 '15 at 11:44