0

Here is the basic outline of my application (which connects to a SQlLite DB):

  1. It scans a directory a creates a list of 8-15 different flat files that need to be read.

  2. It identifies the flat file and chooses a generic prepared statement based on the file extension.

  3. It then reads the flat file line by line and adds the prepared statement with the strings having been set to a batch. The batch is executed each 1,000 statements. Some of the files in question have 200,000 lines that need to be read.

  4. Once all files have been inserted in the database (there is a different table for each file type), the program updates a certain column for each table to a common value.

  5. The program creates a new file for each file type and extracts the information in the database to the new file.

Currently one run on a directory with about 9 very small files (less than 50 rows) and one very large file (more than 200,000 rows) takes about 1.5 minutes to run. I am hoping to get this faster.

A few first questions:

  1. Should I close and open the database connection for each part of the program (loading, updating, extracting) or leave the connection open and pass it off to each different method.

  2. Should I close the prepared statement after each file that is processed? Or just clear the parameters after each file and close it at the end of method (essentially after all the jobs are loaded)?

I am interested in any other comments about things that I should be focusing on the maximize the performance of this application.

Thanks.

Tim Brooks
  • 162
  • 11
  • 3
    Which part exactly is taking the most time? Benchmark, find out, and tell us. – Christoffer Hammarström Mar 02 '12 at 17:26
  • You remembered to disable autocommit? – Thorbjørn Ravn Andersen Mar 02 '12 at 17:56
  • To expand a bit on Christoffer: how do you know the problem is JDBC performance and not, say, your code that reads the contents of the files, parses the data and and passes that data to the prepared statements? – Confusion Mar 02 '12 at 18:00
  • autocommit is disabled. I ran some Execution Time Analysis in Eclipse. 121 secs to run. 64 of those seconds where committed to the method loading the file with 200,000 into the DB. For comparison .08396 seconds were committed the method loading a file with 900 rows into the DB. 44 secs were committed to the method updating all ten database tables. – Tim Brooks Mar 02 '12 at 18:19
  • 4.5 seconds were committed to extracting all ten jobs from the DB and writing them to new flat files. – Tim Brooks Mar 02 '12 at 18:22
  • I found a bug in the update method. A trigger that I had created in SQL was executed 4x more than necessary. So that got that method down to about 10 seconds (which seems reasonable for updating ~250,000 rows). That one method is still killing the program though. – Tim Brooks Mar 02 '12 at 19:26

1 Answers1

0

I would go with the next things as a first step:
1) Use JDBC connection pool like DBCP
2) Split job that reads file and pushes data to DB in separate threads so that one thread processes one file and writes to one table the flag that it is done
3) Same thing for reversed process

Open/Close connection is an expensive operation - so you should open it once and close after all the things are done.
Same for statements. Prepared statement (amongst others) - is the compiled statement by JDBC driver which increases its execution performance; You create it once at the beginning and close after all work is done.

Anatoly
  • 79
  • 1
  • 5