1

I need to make a batch process in java that will read a file of a variable amount of records (though can safely assume 5k+), treat them and then insert the records into an oracle 11g database. I need to do it via call(s) to a stored procedure.

I have made a similar process already some months ago, which I am not too proud of (for no particular reason, other than I am sure it is not optimal at all). The previous solution I kept everything in memory and in order to insert the rows I created an INSERT ALL statement in which I just appended all the records via a stringbuilder, and just executed that one statement to insert 15k+ records in one go. This time I need to use a stored procedure for the insertion of the data. I have been reading and now know there are ways to send arrays of data to stored procedures, so I would be able to send multiple records at a time.

Should I make a Stored procedure that receives arrays and just send all - potentially thousands - of the records in one single call to that SP? Or should I limit it to a certain amount of records at a time and call that SP (records/limit) amount of times?

Or should I stay away from using arrays, and have a stored procedure that just receives the information for one record, and call it as many times as I have records?

If I were to do multiple calls, I was thinking of utilizing PreparedStatements and the .addBatch() and .executeBatch() methods, would this be the way to go?

I need to be able to insert all the records, and rollback in case of any error. For which I am going to use transactions. I am technically not required to meet any threshold in terms of performance, but I am intersted in the topic and this could be a good time to start worrying more about it, so I would like some pointers and tips from someone with experience in the topic.

APC
  • 144,005
  • 19
  • 170
  • 281

2 Answers2

1

"Should I make a Stored procedure that receives arrays and just send all - potentially thousands - of the records in one single call to that SP? Or should I limit it to a certain amount of records at a time and call that SP (records/limit) amount of times?"

Limit to a certain amount of records. I generally start with between 100 and 1000, depending on the total size of a record.

"Or should I stay away from using arrays, and have a stored procedure that just receives the information for one record, and call it as many times as I have records?"

No. You will waste CPU and above all time: every time Java calls the database there is time spent just on sending the message and getting the reply back (related to "latency").

"If I were to do multiple calls, I was thinking of utilizing PreparedStatements and the .addBatch() and .executeBatch() methods, would this be the way to go?"

Yes, but those methods are at their best with SQL statements (such as INSERT), not calls to stored procedures.

I need to be able to insert all the records, and rollback in case of any error.

Set autocommit off (which I recommend in general) and commit when all is OK.

If your stored procedures have no added value, but simply do the inserts, then it would be simpler and very efficient to do batched inserts. There are very good arguments for using stored procedures despite the extra complication, but then you would have to populate the arrays.

In every case, it is vital to use bind variables and not concatenate the values into the SQL statements (or calls to SPs) as literals.

Best regards, Stew Ashton

P.S. for 5k+ records, multi-threading is overkill.

Stew Ashton
  • 1,499
  • 9
  • 6
  • Thank you very much for your answer Stew, you pretty much confirmed what I was thinking. I will try to see if I can get rid of the SP idea and just do the plain sql statement, as you say there would be no added value and at the very least I would need to create the data types for both the record and the array of records, as well as the procedure. I'm going to assume the batch size still applies even if not calling the SP, I will play around with that number as well. – Jorge Fortson Nov 09 '19 at 23:25
  • "as you say there would be no added value" - I meant "IF there is no added value" (because a stored procedure might add value, such as logging). – Stew Ashton Nov 10 '19 at 09:08
  • Batching ONLY applies if you do straight inserts; if you call a procedure, the equivalent of "batching" is the array of records you send as a parameter. – Stew Ashton Nov 10 '19 at 09:08
0

Should you use stored procedure(SP)?

I don't think that it would have any significant performance improvement because you have an INSERT statement. An SP would have been beneficial if you had some complex queries and using it would have saved the query compilation time. A simple insert statement won't take much time to compile.

So, in my opinion, send your query on the fly using Java's PreparedStatement.

The approach I would have followed:

As your requirement is to insert all the data and rollback in case of errors, I would suggest you insert the whole set of data in batches. In case a batch fails, you can just roll back the changes for the batch and retry insertion for the batch.

If you were to send the whole data set in one shot, then you would have to roll back the entire change caused due to an error in a single insert statement.

Another benefit of using batch is that you will send the batch data in a single JDBC connection. Making, maintaining and clearing the connection creates an overhead. Thus, batch saves that overhead to some extent.

Also, you can use multi-threading, where you can have tasks read a batch of data, process them and insert them (using batch). Your tasks can access JDBC connections from a connection pool such as HikariCP. So, while some of the threads are busy with data insertion, others can read and process the data.

Some material to read on multi-threaded inserts : https://dba.stackexchange.com/questions/16131/how-does-oracle-handle-multiple-concurrent-inserts-against-one-table

PS: Constructive criticism is welcome.

uneq95
  • 2,158
  • 2
  • 17
  • 28
  • Appreciate the answer, but I will go with Stew's as he touches more points of my question. Even though in the end you both came to the same conclusion. – Jorge Fortson Nov 09 '19 at 23:26