Does spring data during @Transactional method keep storing entities in
RAM or entities which were flushed are accessible to garbage
collector?
The entities will keep storing in RAM (i.e in entityManager
) until the transaction commit/rollback or the entityManager is cleared. That means the entities are only eligible for GC if the transaction commit/rollback or
entityManager.clear()
is called.
So, what is the best approach to process huge mount of data with
spring data?
The general strategy to prevent OOM is to load and process the data batch by batch . At the end of each batch , you should flush and clear the entityManager
such that the entityManager
can release its managed entities for CG. The general code flow should be something like this:
@Component
public class BatchProcessor {
//Spring will ensure this entityManager is the same as the one that start transaction due to @Transactional
@PersistenceContext
private EntityManager em;
@Autowired
private FooRepository fooRepository;
@Transactional
public void startProcess(){
processBatch(1,100);
processBatch(101,200);
processBatch(201,300);
//blablabla
}
private void processBatch(int fromFooId , int toFooId){
List<Foo> foos = fooRepository.findFooIdBetween(fromFooId, toFooId);
for(Foo foo :foos){
//process a foo
}
/*****************************
The reason to flush is send the update SQL to DB .
Otherwise ,the update will lost if we clear the entity manager
afterward.
******************************/
em.flush();
em.clear();
}
}
Note that this practise is only for preventing OOM but not for achieving high performance. So if performance is not your concern , you can safely use this strategy.