I have a requirement to copy a huge data from excel(5,00,000) rows to Database. Should I go with the Blocking Queue method of multi threading or is there any other way to leverage multi threading on a more efficient scale?
-
1Is this a one-time operation, or will you have to do similar copies again numerous times? This won't take that long in a single-threaded program. If this isn't a time-sensitive and frequent operation, multithreading is not worth the hassle. – Ryan_L Jun 29 '18 at 02:50
-
Its a frequent operation. We receive the excel from one department every alternate day..We receive 21 excels containing data related to different tables..We have to complete it on the same day itself.. – ghostrider Jun 29 '18 at 03:22
-
Unless the data volume is at least a couple of orders of magnitude larger you should not bother with multithreading. However, this question is much too broad and vague for StackOverflow. Please visit the [help] and read [ask] to learn how to use the site. – Jim Garrison Jun 29 '18 at 05:12
2 Answers
500.000 rows nowadays is not that huge amount for the database. I think you should, first of all, optimize the DB access and if you don't have the desired performance, go with more advanced techniques. You've stated Java, 2 optimizations like this come to mind:
- Use Prepared Statement and not Statement from JDBC (or if you use any abstraction over JDBC, make sure that that's the case under the hood). This will allow the DB to not reparse query every time
Use batch operations. There alone will boost the speed in an order of magnitude or so depending on your RDBMS setup:
PreparedStatement pstmt = connection.prepareStatement(<YOUR_INSERT_SQL>); for(...) { // chose batch size like 500 to 2000 pstmt.setXXX(<bind the parameters here>) pstmt.addBatch(); // add to the batch } pstmt.executeBatch(); // does bunch of inserts at once
It can take less than a minute to perform all these operations, or 1-2 minutes, but not, say hours (of course depending on where do you insert the data and the network quality, but typically this is a case).
If it's not enough, you can go with parallel access of course, using a number of connections simultaneously. But again, if its one-time operation, I wouldn't have bothered, after all, it will take you more time to write this multithreaded code than the difference in performance you'll gain :)

- 39,963
- 4
- 57
- 97
-
Its a frequent operation. We receive the excel from one department every alternate day..We receive 21 excels containing data related to different tables in a..We have to complete it on the same day itself – ghostrider Jun 29 '18 at 03:19
-
As I said it will take a minute or so (You should better try to simulate this operation) and measure the performance. Maybe its enough and a single threaded solution will work. Another point to consider - what if a multithreaded access fails to save some record in db, what should you do? In a single threaded - you just memorize the last inserted batch and rerun the operation from there, but in multithreaded environment it won't be that easy – Mark Bramnik Jun 29 '18 at 03:21
if you want to use multithreading to improve the performance better you can create one thread per table and try to perform db-operation over the table. for thread management you can use executor service, threadpoolexecutor etc.

- 160
- 1
- 12