I have the following architechture:
I Import data from a SQL datbase into a mongodb. I use the importer to migrate data into the mongodb that provides the data to a website via an API.
The importing can take a couple of minutes and if it fails I would like to be able to either rollback (it would be awsome to be able to rollback multiple imports) or drop the database/collections of the uncommited rows (if you think of it as SQL transactions).
I tried to import everything into a transactions collection that, on success, moved the data into the correct collection. This took way to much time to be performant. I also tried the solution of importing into a temp db and then swapping them. But then I run into problems if someone e.g. registers a new user on the website after the db-copy but before the importing is done (that user will be lost when swapping).
How can a perform an import in a safe way and not have the most basic concurrency problems?
EDIT: To clarify the solution: I will run the importer in a cron job, at least once a day. I currently keep a timestamp for the latest synchronization and select everything that is newer than that from the SQL-db. Things will automagically appear in the SQL-db over time.
At the end of the importing I run a downloader that downloads all the images from urls in the SQL db.
I don't want to start a new sync before the images are downloaded since that could result in strange behaviour.