0

I am trying o migrate data in MongoDB with mongock and stumbled upon an issue when I need to run migration across a big amount of data.

Is there a way to partition this data in any way? I didn't find support for that on documentation.

Problem is that it takes a lot of time to make queries on this data and to load all of it into memory at once also can cause problems.

UPD: Problem is that sample code can bring 1 or 1kk documents and it can take huge amount of time

mongoTemplate.findAll(User.class).stream()
    .map(this::migrateUser)
    .forEach(mongoTemplate::save);
Constantine
  • 381
  • 1
  • 3
  • 10
  • what is the size of the biggest collection? In term of documents and how big are those documents – Mongock team Dec 13 '21 at 22:17
  • It can be from 100k to 1kk of documents which vary in size. I am now trying to find a tool to use instead of apache beam, which we used for such migrations on Datastore, but we are moving towards mongo and its not supported there properly, and amount of documents can be pretty huge – Constantine Dec 14 '21 at 08:59
  • So I understand you are moving from another database(sql or whatever) to MongoDB. Are you thinking in implementing the mapping details? or or your expectation is to provide source and target database and this tool to the the magic? If the answer is that you are writing the mapping details and expect the tool to provide a generic framework to run the migration, then Mongock will e probably helpful. Otherwise, it still doesn't provide such a feature(although it will) – Mongock team Dec 14 '21 at 10:45
  • No, we already migrated and just looking for a tool to support data model changes on mongo, but as I see that for example there is no way to partition this data during migration if I will run the migration on the entire collection. Am I right, or there is a way? – Constantine Dec 14 '21 at 12:10
  • Updated post with sample which causes conserns – Constantine Dec 14 '21 at 12:19
  • 1
    Ok, Still I would need a bit more information to make a proper suggestion. Why time is a concern? What other concerns you have? Is this migration required to be as part of the application's startup or can be in an independent process? Are you using any orchestration layer like K8s? are the target collections being already used by other services? Is HA a requirement? Can you afford locking the target collections?... – Mongock team Dec 14 '21 at 13:08

0 Answers0