12

I want to do a bulk load into MongoDB. I have about 200GB of files containing JSON objects which I want to load, the problem is I cannot use the mongoimport tool as the objects contain objects (i.e. I'd need to use the --jsonArray aaram) which is limited to 4MB.

There is the Bulk Load API in CouchDB where I can just write a script and use cURL to send a POST request to insert the documents, no size limits...

Is there anything like this in MongoDB? I know there is Sleepy but I am wondering if this can cope with a JSON nest array insert..?

Thanks!

NightWolf
  • 7,694
  • 9
  • 74
  • 121
  • 3
    Why don't you write a small script in your favorite language? –  Jul 01 '11 at 14:02
  • +1 Not sure why the downvote. I would be interested to know also. ("No, there is nothing like that" would be a fine answer, if it is correct.) – JasonSmith Jul 02 '11 at 06:58
  • 1
    Two days later, this remains a good Stack Overflow question. It makes a direct inquiry, for which there is one right answer. I wish the answer would show itself. To that end, I will contribute my own reputation points to the first Mongo user to give a clear, correct answer. (One presumes the answer is **no** but I leave that to to the experts.) – JasonSmith Jul 04 '11 at 08:47
  • Could you show us an example of your data? Why are you required to import with `--jsonArray`? – Theo Jul 04 '11 at 15:00
  • THe content has nested objects. – NightWolf Jul 07 '11 at 14:57

2 Answers2

2

Ok, basically appears there is no real good answer unless I write my own tool in something like Java or Ruby to pass the objects in (meh effort)... But that's a real pain so instead I decided to simply split the files down to 4MB chunks... Just wrote a simple shell script using split (note that I had to split the files multiple times because of the limitations). I used the split command with -l (line numbers) so each file had x number of lines in it. In my case each Json object was about 4kb so I just guessed line sizes.

For anyone wanting to do this remember that split can only make 676 files (26*26) so you need to make sure each file has enough lines in it to avoid missing half the files. Any way put all this in a good old bash script and used mongo import and let it run overnight. Easiest solution IMO and no need to cut and mash files and parse JSON in Ruby/Java or w.e. else.

The scripts are a bit custom, but if anyone wants them just leave a comment and ill post.

NightWolf
  • 7,694
  • 9
  • 74
  • 121
2

Without knowing anything about the structure of your data I would say that if you can't use mongoimport you're out of luck. There is no other standard utility that can be tweaked to interpret arbitrary JSON data.

When your data isn't a 1:1 fit to what the import utilities expect, it's almost always easiest to write a one-off import script in a language like Ruby or Python to do it. Batch inserts will speed up the import considerably, but don't do too large batches or else you will get errors (the max size of an insert in 1.8+ is 16Mb). In the Ruby driver a batch insert can be done by simply passing an array of hashes to the insert method, instead of a single hash.

If you add an example of your data to the question I might be able to help you further.

Theo
  • 131,503
  • 21
  • 160
  • 205