0

Given a postgresql table

               Table "public.test"  
  Column  |            Type             | Modifiers   
----------+-----------------------------+-----------  
 id       | integer                     | not null  
 info     | text                        |   

And the following values :

# select * from test;  
 id |     info     
----+--------------
  3 | value3        
  4 | value4
  5 | value5

As you may know with postgresql you can use this kind of statements to update multiples rows with different values :

update test set info=tmp.info from (values (3,'newvalue3'),(4,'newvalue4'),(5,'newvalue5')) as tmp (id,info) where test.id=tmp.id; 

And it results in the table being updated in a single queries to :

# select * from test;  
 id |     info     
----+--------------
  3 | newvalue3        
  4 | newvalue4
  5 | newvalue5

I have been looking around everywhere as to how to make hibernate generate this kind of statements for update queries. I know how to make it work for insert queries (with reWriteBatchedInserts jdbc option and hibernate batch config options).

But is it possible for update queries or do I have to write the native query myself ? No matter what I do, hibernate always sends separate update queries to the database (I'm looking to the postgresql server statements logs for this affirmation).

2020-06-18 08:19:48.895 UTC [1642] LOG:  execute S_6: BEGIN
2020-06-18 08:19:48.895 UTC [1642] LOG:  execute S_8: update test set info = $1 where id = $2
2020-06-18 08:19:48.895 UTC [1642] DETAIL:  parameters: $1 = 'newvalue3', $2 = '3'
2020-06-18 08:19:48.896 UTC [1642] LOG:  execute S_8: update test set info = $1 where id = $2
2020-06-18 08:19:48.896 UTC [1642] DETAIL:  parameters: $1 = 'newvalue4', $2 = '4'
2020-06-18 08:19:48.896 UTC [1642] LOG:  execute S_8: update test set info = $1 where id = $2
2020-06-18 08:19:48.896 UTC [1642] DETAIL:  parameters: $1 = 'newvalue4', $2 = '5'
2020-06-18 08:19:48.896 UTC [1642] LOG:  execute S_1: COMMIT

I always find it many times faster to issue a single massive update query than many separate update targeting single rows. With many seperate update queries, even though they are sent in a batch by the jdbc driver, they still need to be processed sequentially by the server, so it is not as efficient as a single update query targeting multiples rows. So if anyone has a solution that wouldn't involve writing native queries for my entities, I would be very glad !


Update

To further refine my question I want to add a clarification. I'm looking for a solution that wouldn't abandon Hibernate dirty checking feature for entities updates. I'm trying to avoid to write batch update queries by hand for the general case of having to updating a few basic fields with different values on an entity list. I'm currently looking into the SPI of hibernate to see it if it's doable. org.hibernate.engine.jdbc.batch.spi.Batch seems to be the proper place but I'm not quite sure yet because I've never done anything with hibernate SPI). Any insights would be welcomed !


singe3
  • 2,065
  • 4
  • 30
  • 48

1 Answers1

1

You can use Blaze-Persistence for this which is a query builder on top of JPA which supports many of the advanced DBMS features on top of the JPA model.

It does not yet support the FROM clause in DML, but that is about to land in the next release: https://github.com/Blazebit/blaze-persistence/issues/693

Meanwhile you could use CTEs for this. First you need to define a CTE entity(a concept of Blaze-Persistence):

@CTE
@Entity
public class InfoCte {
  @Id Integer id;
  String info;
}

I'm assuming your entity model looks roughly like this

@Entity
public class Test {
  @Id Integer id;
  String info;
}

Then you can use Blaze-Persistence like this:

criteriaBuilderFactory.update(entityManager, Test.class, "test")
  .with(InfoCte.class, false)
    .fromValues(Test.class, "newInfos", newInfosCollection)
    .bind("id").select("newInfos.id")
    .bind("info").select("newInfos.info")
  .end()
  .set("info")
    .from(InfoCte.class, "cte")
    .select("cte.info")
    .where("cte.id").eqExpression("test.id")
  .end()
  .whereExists()
    .from(InfoCte.class, "cte")
    .where("cte.id").eqExpression("test.id")
  .end()
.executeUpdate();

This will create an SQL query similar to the following

WITH InfoCte(id, info) AS(
  SELECT t.id, t.info
  FROM (VALUES(1, 'newValue', ...)) t(id, info)
)
UPDATE test
SET info = (SELECT cte.info FROM InfoCte cte WHERE cte.id = test.id)
WHERE EXISTS (SELECT 1 FROM InfoCte cte WHERE cte.id = test.id)
Christian Beikov
  • 15,141
  • 2
  • 32
  • 58
  • Thank you for your detailed answer. However I'm afraid it doesn't entirely solve my issue since I would still have to write a query using the blaze persistence query builder. I should have stated in my question that I'm looking for a way to using "cte" or "update from values" update queries while still leveraging hibernate dirty checking features, without writing custom queries for each update case. I'm currently looking into hibernate SPI org.hibernate.engine.jdbc.batch.spi.Batch to understand if its doable, but I'm not familiar with dealing so deep into Hibernate internals, so i'm quite lost – singe3 Jun 19 '20 at 14:41
  • So you want entities to be flushed this way? I kind of doubt this will give you the performance that you want. What you probably want is JDBC batching: https://www.baeldung.com/jpa-hibernate-batch-insert-update – Christian Beikov Jun 19 '20 at 14:48
  • I'm already using jdbc batching for insert, update and delete. For insert it works great because the pgjdbc rewrite them in a single statement before sending it. But for update or delete it doesn't offer such a feature. Don't you think having a single update query (with different updated values for each row in my case) would be faster that multiple (ex batch of 200) update queries even though they are sent in a single batch ? – singe3 Jun 19 '20 at 15:00
  • I don't think a single statement with literals or parameters will be better because that is usually more work for the parser. You can try it yourself and I'm pretty sure you will see that a single prepared statement that sends big data batches will perform better. – Christian Beikov Jun 19 '20 at 15:04
  • It was actually rather a rethorical question. It's actually faster in my case with updating a few fields with different values of hundreads of rows. I have done some benchmarks and updating a batch of 1000 random rows in a table of 1M rows is on average 3 to 4 times faster with a single update statement (with from values clause) than with multiple update statements sent in a jdbc batch. I know my benchmark may not be generalized but I just wanted to say that I'm not looking into this issue without looking first that it was actually faster doing a single update with many values. – singe3 Jun 19 '20 at 15:48
  • Can you share that benchmark somewhere? If that is true, it might be interesting to inform the PG-JDBC developers since it should be at least on par IMO. This optimization could be implemented in Hibernate, but I'm not sure how fast that will happen. So unless you plan to implement it or pay someone to do it, the best you can do for now would be to use an insert-select statement with the values clause form Blaze-Persistence. – Christian Beikov Jun 22 '20 at 06:33