3

JDBC has been supporting bulk updates for a long time using addBatch and executeBatch. Why isn't there any support for adding a bunch of prepared statements and getting an array of result sets as response?

For example, if I wanted to load customer details, basic account details, basic card details, basic loan details etc. for a single view, I would prefer to create a bunch of prepared statements and append the prepared statements to an ArrayList and execute them as a batch. I would then loop through the result sets and process the data. Hopefully, several network round trips would be saved (assuming my queries are performant).

Sample bunch of queries:

SELECT custid, first, last, age FROM Customer where custid = ?
SELECT custid, acno, accountname, accounttype, status FROM Account where custid = ?
SELECT custid, cardno, cardname, cardtype, status FROM CreditCard where custid = ?
SELECT custid, loanno, principal, rate FROM Loan where custid = ?

I can imagine several hypothetical reasons why it could be a bad idea. But, I am not sure which is most likely true in the real world.

Hypothetical reasons against having bulk-fetch:

  1. There is some fundamental networking/db stack/memory related issue which prevents a bunch of select queries to be executed on the same connection and result-sets kept open.
  2. Response handling code would be too cumbersome, as there could be exceptions at call level and individual statement level. And, several statements would have to be closed correctly.
  3. There is no significant performance gain in reducing the number of network-calls. Query execution is the main bottleneck and network round-trip cost is insignificant.
  4. There could be misuse of such a feature. A single non-performant query batched up like this with other queries could cause application to perform poorly.

The reason I ask this is because often I see a lot of Join queries which merge parent-child relationships into a single SQL query, just for the sake of completing the loading in a single call.

However, as the number of tables grows, the query becomes complex. Also, the parent table information is repeated in every row of every child. So, there is huge amount of data redundancy in the single join-ed result set.

Sample join query:

SELECT custid, first, last, age, acno, accountname, accounttype, a.status, cardno, cardname, cardtype, c.status, loanno, principal, rate
FROM Customer cc, Account a, CreditCard c, Loan l 
WHERE a.custid=CC.custid(+) and c.custid=CC.custid(+) and l.custid=CC.custid(+)
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
Teddy
  • 4,009
  • 2
  • 33
  • 55
  • 1
    ***Completely*** unrelated, but: you should really get used to using explicit an `JOIN` operator rather than the implicit joins in the where clause - especially for outer joins. This is also what [Oracle recommends](https://docs.oracle.com/database/121/SQLRF/queries006.htm#i2054062): "*Oracle recommends that you use the FROM clause OUTER JOIN syntax rather than the Oracle join operator*" –  Aug 03 '15 at 12:03

4 Answers4

2

The JDBC API does support this.

Statement.getMoreResults() can tell you if the SQL statement you executed through execute() produced more than one ResultSet

Quote from the JavaDocs for getMoreResults():

Moves to this Statement object's next result, returns true if it is a ResultSet object, and implicitly closes any current ResultSet object(s) obtained with the method getResultSet.

There are no more results when the following is true:

// stmt is a Statement object<br>
((stmt.getMoreResults() == false) && (stmt.getUpdateCount() == -1))

However it depends on the backend DBMS and the JDBC driver if you can use this. Some JDBC driver simply reject to run more than one statement with a single execute() call (mainly as a means to prevent SQL injenction), others don't.

So in e.g. Postgres you can do something like this:

boolean hasResult = stmt.execute(
  "select * from table_1;\n" +
  "select * from table_2;");

while (hasResult) 
{
  rs = stmt.getResultSet();
  while (rs.next()) 
  {
    // process the result set
  }
  hasResult = stmt.getMoreResults();
}

This even allows mixing SELECT and e.g. UPDATE statements if you also check for getUpdateCount()

As far as I know you can also do this with SQL Server. It does not work with Oracle.

I haven't tried this with a PreparedStatement though. But as getMoreResults() is defined for Statement it is available for a PreparedStatement as well.

  • Figures... I have worked with Oracle backend a lot. Thanks. – Teddy Aug 03 '15 at 11:46
  • This is good info, but even if you _can_ execute multiple statements per `Statement`, you probably usually _shouldn't_. – Mick Mnemonic Aug 03 '15 at 12:13
  • Executing multiple statements in one execute is not allowed by the JDBC standard; it does require some reading between the lines, because it doesn't explicitly say so, it just consistently talks about statement (singular) in the context of execution. – Mark Rotteveel Aug 04 '15 at 06:30
1

How about to put queries to a procedure and then use CallableStatement to execute that procedure?

A CallableStatement can return one ResultSet object or multiple ResultSet objects. Multiple ResultSet objects are handled using operations inherited from Statement.

  try 
  {
      CallableStatement stmt = con.prepareCall(/* call procedure */);       

      boolean results = stmt.execute();
      int rsCount = 0;    

      while (results) 
      {
           ResultSet rs = stmt.getResultSet();

           while (rs.next()) 
           {

           }
           rs.close();    
        results = stmt.getMoreResults();
      } 
      stmt.close();
   }
   catch (Exception e) {
      e.printStackTrace();
   }
  • Thank you.. This is quite close! I think this is what you mean http://www.herongyang.com/JDBC/MySQL-CallableStatement-Multiple-ResulSet.html Although, I wish I could do the same with ad-hoc queries using an array of Statements – Teddy Aug 03 '15 at 11:31
  • 1
    There was a discussion of getting multiple results http://stackoverflow.com/q/9696572/1737819 – Developer Marius Žilėnas Aug 03 '15 at 11:34
0

Relational databases are designed and optimized for retrieving data through SQL queries that JOIN data from multiple tables. Executing a single query that (correctly) JOINs data is likely always more efficient than getting the same data with separate queries.

When a single query gets too complex, it should be refactored into a VIEW -- from which you can query, joining data from other TABLEs and VIEWs, if required.

Given the above, I don't see a need for bulk queries.

Mick Mnemonic
  • 7,808
  • 2
  • 26
  • 30
  • Yes... that is true. I had some concern about the huge amount of data repetition. Is there a point when this data repetition becomes a issue? For example, have you faced a situation where say beyond 1:100 parent-child ratio it becomes more memory-efficient to fire separate queries? – Teddy Aug 03 '15 at 11:23
0

I get the feeling you don't understand what a prepared statement is.

A prepared statement is an object you declare once, then reuse it all the time with different supplied parameters to it.

You're not telling me that you recreate a prepared statement from scratch each time you wish to execute it again?

Say you have four loops. before executing your loops you do this:

 PreparedStatement statement1, statement2, statement3,statement4;
 try {
        con.setAutoCommit(false);//only needed when also doing updates/inserts
        statement1 = con.prepareStatement("SELECT custid, first, last, age FROM  Customer where custid = ?");
        statement2 = con.prepareStatement("SELECT custid, acno, accountname, accounttype, status FROM Account where custid = ?");
        // etc....
        for (Map.Entry<String, Integer> e : customers.entrySet()) {
            statement1.setInt(1, e.getValue().intValue());
            ResultSet rs = statement1.executeQuery();
            // do what you need to do
            statement2.setInt(1, e.getValue().intValue());
            ResultSet rs2 = statement2.executeQuery();
            // do what you need to do
         }
         con.commit();//only needed when also doing updates/inserts
     }

   }

There is no need to recreate the prepared statements. That is why its calleda prepared statement. You just feed it the new values it needs to query.

This way you can add it to lists, itereate it the way you want to itereate it, etc.. and it's all optimised since the database engine will remember the query plans and the optimisations it makes for it. What you do with the prepared statement object is up to you.

It also does this if you recreate the objects constantly because it will remember the query, but you save the overhead of createing new objects over and over and the memory issues that come with that.

So, without a clearer question this is the best answer I can give you.

Tschallacka
  • 27,901
  • 14
  • 88
  • 133
  • You are right in saying that I need not be using PreparedStatement in my example. My question would apply even to ordinary Statement. Could I execute 5 Statements in one network call and get a array of ResultSet objects? – Teddy Aug 03 '15 at 11:26
  • 1
    Define network call. Your application has an open network to the database server. Over that connection it constantly talks back and forth. Behind the scenes it will be 5 calls over the same connection. But I believe even the "batch" objects do some voodoo and in the end if you monitor the connection it will be seperate queries. So it's all a matter of just implementing your own wrapper for it. If you really need the high efficiency, you need to go the complex query route. If you wish to have maintainablity, go the 5 calls route but then shape into yourdesired shape. I'll add an example. – Tschallacka Aug 03 '15 at 11:30
  • I was assuming that one query = one network round trip (at least for small result sets). That is, if I fire 4 separate small queries which return 1KB result set each, it would be costly. And batching them and getting 4KB result in one shot would be cheaper (faster). Is this wrong? – Teddy Aug 03 '15 at 11:39
  • Adding an example will have to forgo, I don't have the time atm to do that, because you can't retain the resultsets itself as the cursors will update on re-execution of the code. You need to store the values etc.. in a custom object, or store the queries and make your own batching process. ------ The 4kb would be faster yes. But unless you have very very high traffic or very big data volume it's neglible.(more than 128000 queries a second on a 1000mbit connection) and I wouldn't optimise for it. – Tschallacka Aug 03 '15 at 11:43
  • From what you're saying just grouping queries may not have a big performance boost. Ill do some benchmarking with postgres and see how it turns out. Thanks! – Teddy Aug 03 '15 at 18:50