There are 0.5 million records per day, each record consists of ~500 bytes and we have to analyze a year's records. To speed up the process, it would be better to load all records at once but we can't as it requires ~88 GB of memory. Number of records may exceed in future.
Another approach was to load these records w.r.t. groups since we analyze these records as groups and there are 25000 groups, which may exceed too.
We may load one group at a time, analyze the group, discard and load another....But this is resulting in very slow process, going to the database server 25000 times!!!. Single threaded process with data available in Memory is much faster than multithreaded process (thread count 32) with trips to the database.
Is there any approach we can handle this loading of huge data and minimize the no. of trips to the database OR Loading a Collection of size more than available memory OR a library which could wrap the on demand loading of data(partial collection)?