I have to process a large XML file (around 25 mb in size), and organize the data into documents to import into MongoDB.
The issue is, there are around 5-6 types of elements in the xml document, each with around 10k rows.
After fetching one xml node of type a, I have to fetch it's corresponding elements of types b,c,d, etc.
What I am trying to do in node:
- Fetch all the rows of type a.
- For each row, using xpath, find its corresponding related rows, and create the document.
- Insert document in mongodb
If there are 10k rows of type a, the 2nd step runs 10k times. I am trying to get this to run in parallel so that the thing doesn't take forever. Hence, async.forEach seemed to be the perfect solution.
async.forEach(rowsA,fetchA);
My fetchrelations function is sort of like this
var fetchA = function(rowA) {
//covert the xml row into an object
var obj = {};
for(i in rowA.attributes) {
attribute = rowA.attributes[i];
if(attribute.value === undefined)
continue;
obj[attribute.name] = attribute.value;
}
console.log(obj.someattribute);
//first other related rows,
//callback inserts the modified object with the subdocuments
findRelations(obj,function(obj){
insertA(obj,postInsert);
});
};
After I try to run this, the console.log in the code only runs about once in every 1.5 seconds, not parallely for every row as I expected. I have been scratching my head and trying to figure this out for the past two hours, but I am not sure what I am doing wrong.
I am not very adept with node, so please be patient.