Your data coming from DB2 should already have a unique primary key, and possibly additional unique business keys in the data. If you populate these field(s) as the _id
in MongoDB (rather than allowing MongoDB to autogenerate an _id) then you will be able to avoid duplicates on the MongoDB side. If you attempt to insert the same record twice you will get a DuplicateKeyException.
In addition to that, it seems excessive for you to have to completely restart the load process if there are errors on individual records. But perhaps you've got more serious problems that need to be addressed, e.g. the loader is crashing the JVM?
Perhaps you just need to improve your loader process so that you don't have to start completely over. And if you populate the _id
as I suggested, you will have the added assurance that you're not inserting duplicate records.