Basically, the needed job is for large amount of records on a data base, and more records can be inserted all the time:
Select <1000> records with status "NEW" -> process the records -> update the records to status "DONE".
This sounds to me like "Map Reduce".
I think that the job described above can may be done in parallel, even by different machines, but then my concern is:
When I select <1000> records with status "NEW" - how can I know that none of these records are already being processed by some other job ?
The same records should not be selected and processed more than once of course. Performance is critical. The naive solution is to do the mentioned basic job in a loop.
It seems related to big data processing / nosql / map reduce etc'.
Thanks