3

I'm using JDBC's batch to inserting a million of rows. I was faced with that Oracle driver doesn't work as expected - batch insert takes a long time to work. I have decided to sniff application's traffic with Wireshark. And what did I see?

  • Oracle JDBC driver sent first request (1)
  • then it sending data (2), about 2500 rows
  • oracle server responds with some package (3)
  • now all remain data will be send with one-by-one inserts, not batching!
    • insert into my_table...
    • insert into my_table...

Why does this happen? How can I fix this?

Table

create table my_table (val number);

Code

import java.math.BigDecimal;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;

public class scratch_1 {

    @Test
    public void foo() throws SQLException {
        String sql = "insert into my_table (val) values (?)";

        try (Connection con = getConnection()) {
            con.setAutoCommit(false);
            try (PreparedStatement ps = con.prepareStatement(sql)) {

                for (long i = 0; i < 100_000; i++) {
                    ps.setBigDecimal(1, BigDecimal.valueOf(i));

                    ps.addBatch();
                }

                ps.executeBatch();
                ps.clearBatch();
            }
            con.commit();
        }
    }

    private Connection getConnection() throws SQLException {
        String url = "jdbc:oracle:thin:@localhost:1521:orcl";
        String user = "my_user";
        String password = "my_password";
        return java.sql.DriverManager.getConnection(url, user, password);
    }
}

Wireshark code to illustrate what is happened:

Wireshark log

Environment

$ java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

Oracle Database 12.2.0.1 JDBC Driver

Server: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit

Running query multiple times does not help - same result. 250k rows "batch" inserted in 465s

At the Server side v$sql:

SELECT *
FROM
  (SELECT REGEXP_SUBSTR (sql_text, 'insert into [^\(]*') sql_text,
    sql_id,
    TRUNC(
    CASE
      WHEN SUM (executions) > 0
      THEN SUM (rows_processed) / SUM (executions)
    END,2) rows_per_execution
  FROM v$sql
  WHERE parsing_schema_name = 'MY_SCHEMA'
  AND sql_text LIKE 'insert into%'
  GROUP BY sql_text,
    sql_id
  )
ORDER BY rows_per_execution ASC;

enter image description here

referee
  • 106
  • 3
  • 9
  • Maybe u should try to run your sample code with different number of executions (and compare them). Also you can investigate from the other side the view v$sql - columns executions, rows_processed. – ibre5041 Jul 30 '18 at 08:55
  • Thank you for comment. I updated the description, did I answer your question? – referee Jul 30 '18 at 09:36
  • PS: do you have any triggers or constraints on target table? In case of OCI api this becomes non-trivial, because Oracle returns a vector of values and you have to check for every table whether the bulk operation succeed or not. – ibre5041 Jul 30 '18 at 19:42
  • No, I dont'. Very simple table, no constraints or triggers – referee Jul 31 '18 at 03:45

2 Answers2

2

Problem is solved

Thank you for all your responses. I'm very grateful to you!

My previous example doesn't describe real problem. Sorry that did not give the whole picture at once.
I simplified it to such a state that I lost processing of null values.
Check please example above I have updated it.
If I use java.sql.Types.NULL Oracle JDBC driver used theVarcharNullBinder for null values - it somehow leads to such a strange work. I think that Driver is used batch until first null with not specified type, after null it is fallback to one-by-one insert.

After change it to java.sql.Types.NUMERIC for number column driver used theVarnumNullBinder and correctly work with it - fully batching.

Code

import java.math.BigDecimal;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;

public class scratch_1 {

    @Test
    public void foo() throws SQLException {
        String sql = "insert into my_table (val) values (?)";

        try (Connection con = getConnection()) {
            con.setAutoCommit(false);
            try (PreparedStatement ps = con.prepareStatement(sql)) {

                for (long i = 0; i < 100_000; i++) {
                    if (i % 2 == 0) {
                        //the real problem was here:
                        //ps.setNull(1, Types.NULL); //wrong way!
                        ps.setNull(1, Types.NUMERIC); //correct
                    } else {
                        ps.setBigDecimal(1, BigDecimal.valueOf(i));
                    }

                    ps.addBatch();
                }

                ps.executeBatch();
                ps.clearBatch();
            }
            con.commit();
        }
    }

    private Connection getConnection() throws SQLException {
        String url = "jdbc:oracle:thin:@localhost:1521:orcl";
        String user = "my_user";
        String password = "my_password";
        return java.sql.DriverManager.getConnection(url, user, password);
    }
}
referee
  • 106
  • 3
  • 9
  • Just out of curiosity: why are you using `setBigDecimal`, and what happens if you use `setBigDecimal(1, null)` instead of `setNull(1, Types.NUMERIC)`? – Mark Rotteveel Jul 31 '18 at 14:02
  • 1
    `setBigDecimal(1, null)` and `setNull(1, Types.NUMERIC)` is equal if I looked at the Oracle JDBC code correctly. But this code is also used for another databases (not Oracle) and we met with some issues with SQL Server when we don't use `ps.setNull()` (it was a long time ago, I can be wrong) – referee Jul 31 '18 at 14:43
1

I am not sure as to where this limit comes from. However, Oracle JDBC Developer's Guide gives this recommendation:

Oracle recommends to keep the batch sizes in the range of 100 or less. Larger batches provide little or no performance improvement and may actually reduce performance due to the client resources required to handle the large batch.

Of course larger batch sizes may be used but they do not necessarily increase the performance as you've witnessed. One should use the batch size which is optimal for the use case and JDBC driver/DB used. You probably should use batches of 2500 in your case to see the best performance benefit.

Julius Zaldokas
  • 234
  • 2
  • 7
  • Thank you for response! `100` batch size is very small. I have tried batches with size 2500 (and even in a different connections). JDBC driver after the first insert 2500 the next chunk inserts one row at a time. – referee Jul 30 '18 at 09:38
  • "100 batch size is very small" if I want to insert millions rows – referee Jul 30 '18 at 09:50
  • 1
    @referee Have you **tried** smaller batches? Smaller batches can help because it has less overhead then collecting and sending thousands of rows. Batch size is the balance between reducing the overhead of network roundtrip delays with the overhead of increased memory usage and garbage collection. – Mark Rotteveel Jul 30 '18 at 15:03
  • @MarkRotteveel I think I found the real problem, check please my updated question. Sorry that I confused you all – referee Jul 31 '18 at 10:29
  • 1
    @referee Instead of adding the solution to your question, consider posting it as an answer. That is how stack overflow is supposed to be used :), and as a bonus, after a timeout, you can accept your own answer. – Mark Rotteveel Jul 31 '18 at 12:45