-1

I have to improve the performance of a very slow code and I am pretty new to Hibernate. I have studied carefully the code and concluded that the issue is that it has a large set of entities to load and update/insert. To translate the algorithm to a more digestible example, let's say we have an algorithm like this:

for each competitionToSave in competitionsToSave
    competition <- load a Competition by competitionToSave from database

    winner <- load Person by competitionToSave.personID

    do some preprocessing

    if (newCompetition) then
        insert competition
    else
        update competition
    end if
end for

This algorithm is of course problematic when there are lots of competitions in competitionToSave. So, my plan is to select all competitions and winners involved with two database requests the most, preprocess data, which will quicken the read, but more importantly, to make sure I will save via insert/update batches of 100 competitions instead of saving them separately. Since I am pretty new to Hibernate, I consulted the documentation and found the following example:

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

for ( int i=0; i<100000; i++ ) {
    Customer customer = new Customer(.....);
    session.save(customer);
    if ( i % 20 == 0 ) { //20, same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}

tx.commit();
session.close();

However, I am not sure I understand it correctly. About the method .save() I read:

Persist the given transient instance, first assigning a generated identifier. (Or using the current value of the identifier property if the assigned generator is used.) This operation cascades to associated instances if the association is mapped with cascade="save-update".

But it is unclear to me whether a request to the database is send upon every save. Am I accurate if I assume that in the example taken from the documentation session.save(customer) saves the modification of the object in the Session without sending a request to the database and then on every 20th item the session.flush() sends the request to the database and session.clear() removes the cache of the Session?

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
  • 1
    As far as I know hibernate, your thoughts are right. I just worked on a project with had to save/update like 2.5m records. This project taught me that hibernate use crap for big amount of data. In the end I used a stored procedure with just **1** transaction. If you really want to speed up hibernate, look into batch processing as well as proper transaction management – XtremeBaumer Jan 08 '19 at 10:37
  • @XtremeBaumer excellent comment. Reading the documentation I considered this possibility, but it seems to be an overkill in my specific case. The client waits 10 minutes daily. I estimate that the ideal code I would like to generate would be approximately 200x faster, which means that we would have 600 000 / 200 milliseconds, which is 3 seconds. This would be acceptable. – Lajos Arpad Jan 08 '19 at 10:57

1 Answers1

1

You are correct in your assumptions, though the inserts will be triggered one-by-one:

insert into Customer(id , name) values (1, 'na1');
insert into Customer(id , name) values (2, 'na2');
insert into Customer(id , name) values (3, 'na3'); 

You can try and take advantage of the bulk insert feature to increase the performance even more.

There is hibernate property which you can define as one of the properties of hibernate's SessionFactory:

<property name="jdbc.batch_size">20</property>

With this batch setting you should have output like this after each flush:

insert into Customer(id , name) values (1, 'na1') , (2, 'na2') ,(3, 'na3')..

One insert instead of a twenty.

Maciej Kowalski
  • 25,605
  • 12
  • 54
  • 63
  • Excellent answer. I would like to set the batch size only for this method though. This is the only missing part of the puzzle, which prevents me from accepting your answer, but will try to find out. If I find it, I will comment it here. A value of jdbc.batch_size (20 in your example) means that any insert will not be executed, until flush is called or the "insert queue" reaches a size of 20, if I am not mistaken. – Lajos Arpad Jan 08 '19 at 10:54
  • I found out from here: https://stackoverflow.com/questions/2729255/changing-hibernate-batch-size-programmatically that I need to create another session factory in order to do this. If you add this information to your high-quality answer, then I will accept it. – Lajos Arpad Jan 08 '19 at 10:59
  • Thats an addition which was not really pointed out as a problem in the original question. Thats a separate problem in my opinion. – Maciej Kowalski Jan 08 '19 at 12:14
  • Changing the batch size of a SessionFactory which is widely used in a project could have undesired side effects. I think we need to let future readers of the accepted answer know how they can avoid such undesired side-effects. – Lajos Arpad Jan 08 '19 at 13:30
  • Defining side effects and pitfalls of that property is out of scope, especially if your title and question just aim to confirm your understanding of hibernates `save` and `flush` operations. Its better if readers just read one of many great articles on the subject -> https://vladmihalcea.com/how-to-batch-insert-and-update-statements-with-hibernate/ and get a broader understanding this way. Cheers – Maciej Kowalski Jan 09 '19 at 08:47
  • You convinced me. – Lajos Arpad Jan 09 '19 at 20:30