0

I have a H2 database named PRODUCTS, with 5000 rows and 55 columns per row. I'm currently using a PreparedStatement to add rows of values to the database, but it does not check whether the row exists already. Instead, I need to only insert the row if the column name "id" (type VARCHAR) does not contain a certain alpha-numeric string, and if the column name "id2" (type VARCHAR) does not a certain alpha-numeric string, and if the column name "raw_yn" (type BOOLEAN) contains false. The values I enter into the prepared statement are provided via a method.

The question here is very close to what I'm asking, with the difference being the fact that it's solution is based on adding rows to an empty DB, and ensures the database is empty. The creator of H2 commented saying:

Then the "where not exists" ensures this rows are only inserted if the table [TABLE] is empty.

How do I adapt this code so that it only performs the INSERT query if my above 3 requirements are met (whether or not DB is empty)?

Currently I have:

 import java.sql. *;
       static final String JDBC_DRIVER = "org.h2.Driver";
       static final String DB_URL = "jdbc:h2:~/myDB";
       static final String USER = "test";
       static final String PASS = "test";
       static final Connection conn = null;
       static final Statement stmt = null;

       public class DataBaseManager {

           public void insertIntoDB(String id1val, String id2val, Boolean raw_yn_val, ...,...) {

               try {
                   Class.forName(JDBC_DRIVER);
                   conn = DriverManager.getConnection(DB_URL, USER, PASS);
                   myStatement = "INSERT INTO PRODUCTS VALUES(?,?,?,..,...)";
                   stmt = conn.prepareStatement(myStatement);
                   stmt.setString(1, id1val);
                   stmt.setString(2, id2val);
                   stmt.setBoolean(3, raw_yn_val);
                   stmt.setString(4,....);
                // Continue up to 55
                   stmt.executeUpdate();
               }

                 [catch&finally blocks]
           }
       }

What should myStatement be changed to? I'm confused because if I use select 0, 'id1' union like below, how does that fit into my PreparedStatement of stmt.setString(1, id1val); ??. Thanks for your help.

INSERT INTO PRODUCTS SELECT * FROM(
select 0, 'id1' union                         // <--- How does this fit into Prepared Statement?
select 1, 'id2' union
select 2, 'raw_yn' union
) x where not exists(SELECT * FROM PRODUCTS);  // <--- Ensures only works when empty

UPDATE:

Following Gord's suggestion, I've put together the following code. If the database is empty, n will return 0. This takes about 1 minute to add 5000 rows to the empty database. However, if there IS a match, it's taking nearly 5x longer, even though I'm merely using return as opposed to using additional INSERT code. So shouldn't it be faster?

try {
   Class.forName(JDBC_DRIVER);
   conn = DriverManager.getConnection(DB_URL, USER, PASS);
   statement = conn.createStatement();

   String sql = "SELECT COUNT(*) AS n FROM PROPERTIES WHERE id1='" + id1 + "' AND id2='" + id2 + "' AND raw_yn='true'";
   rs = statement.executeQuery(sql);
   rs.next();
   if (rs.getInt("n") == 0) {
    Class.forName(JDBC_DRIVER);
    conn = DriverManager.getConnection(DB_URL, USER, PASS);
    myStatement = "INSERT INTO PROPERTIES VALUES(?,?,?,...)";
    stmt = conn.prepareStatement(myStatement);
    stmt.setString(1, id1);
    stmt.setString(2, id2);
    stmt.setBoolean(3, raw_yn);
    stmt.executeUpdate();
   } else {
    return; // <-- Takes 5x longer to go through ???
   }

  }

    [catch & finally blocks]
Community
  • 1
  • 1
Mathomatic
  • 899
  • 1
  • 13
  • 38
  • 1
    Since it looks like you will be inserting the rows one at a time, perhaps your `insertIntoDB` method can do a `SELECT COUNT(*) AS n FROM PRODUCTS WHERE ...` to determine whether the required row already exists, and then only perform the `INSERT` if it does not (i.e., if the row count from the `SELECT` is zero). – Gord Thompson Jul 23 '16 at 16:59
  • Consider using a staging temp table, *ProductsTemp*, to then migrate to final *Products* per row check. However, checking 55 values may be tedious or did you want to check only id, id2, and raw_yn? – Parfait Jul 23 '16 at 17:58
  • Thanks Gord, I'll report back. But first, how do I retrieve the actual `int` value of the `n`? If I do `String sql = "SELECT COUNT(*) AS n FROM PROPERTIES WHERE id='"+id1val+"'"; rs = statement.executeQuery(sql);` I only have a result set, but how do I access `n` to get the count? @Parfait, I only need to check id, id2 and raw_yn for each potential row insertion. Are you able to provide an example of your potential solution? I'm working on Gord's now. Thanks – Mathomatic Jul 23 '16 at 17:59
  • 1
    *"but how do I access n to get the count?"* - You need to do `rs.next();` and then you can use `rs.getInt("n")` or `rs.getInt(1)` – Gord Thompson Jul 23 '16 at 19:16
  • Almost there Gord, I've updated the bottom of my question to show where I'm stuck. It's taking 5x longer to cycle through 5000 rows if there is already a match - which seems backwards. Shouldn't it be faster since all I have to do is `return`, as opposed to then using an `INSERT` query if no match was found. Please see above. What's wrong here? – Mathomatic Jul 23 '16 at 20:19
  • 1
    The second time around you save time by not having to do the INSERTs but the SELECTs will take longer because the table contains more rows. Does your table have indexes on the `id1`, `id2`, and `raw_yn` columns? – Gord Thompson Jul 24 '16 at 04:05
  • That makes sense Gord. It should be pointed out that I'm looping through a spreadsheet (.xlsx) using Java POI in order to add it's rows to my DB. If you're asking whether I know the indexes of `id1`, `id2`, and `raw_yn` from my spreadsheet, the answer is yes/no. No because the boolean column `raw_yn` is only added to my DB, and doesn't exist in the spreadsheet. Yes, because I initially loop through the spreadsheet and obtain the column index for `id1` & `id2`. These may change but assume their columns are 1 & 2. How may I speed up the `SELECT` query the second time around (larger DB)? Thanks! – Mathomatic Jul 24 '16 at 12:13
  • Or if you're asking about my DB indexes, `id1` and `id2` are in columns 1 and 2, and `raw_yn` is column 55 (last one). These were specified when I created the database. How can I increase the `SELECT` queries speed? – Mathomatic Jul 24 '16 at 12:43
  • 1
    I was referring to indexes defined in the structure of the H2 table itself. That is, if in the H2 console you do `SCRIPT NODATA TABLE "PROPERTIES"` does the output contain `CREATE INDEX` statements (or similar) for the `id1`, `id2`, and `raw_yn` columns? If not, you can use [CREATE INDEX](http://www.h2database.com/html/grammar.html#create_index) statements to add them. – Gord Thompson Jul 24 '16 at 12:48
  • That solved it Gord, I appreciate it. It was a combination of changing `SELECT COUNT(*) AS ... ` to `SELECT COUNT(1) AS ...` and also creating an index on each of the 3 columns in question. This cut down the time substantially. I did a bit of reading and it seems like the H2 database itself determines whether to implement the indexes on a specific query or not. Pretty cool. Thank you sincerely for your time. – Mathomatic Jul 24 '16 at 22:52
  • PS: Feel free to aggregate your suggestions into a question and I'll accept it. – Mathomatic Jul 24 '16 at 22:55

1 Answers1

1

Consider using a staging temp table where you append all data as is into a similar structured table, ProductsTemp, and then migrate to final table, Products, filtered for the unique rows. Below are the SQL statements to incorporate in your Java code in this order:

Staging Append (two statements)

DELETE FROM ProductsTemp;

INSERT INTO ProductsTemp VALUES (?,?,?,..,...);

Final Migration

INSERT INTO Products (id, id2, raw_yn, ...)
SELECT id, id2, raw_yn, ...
FROM ProductsTemp temp
WHERE NOT EXISTS (SELECT 1 FROM Products sub
                   WHERE sub.id = temp.id
                     AND sub.id2 = temp.id2
                     AND sub.raw_yn = temp.raw_yn);
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • I wanted to fully test your suggestion before commenting Parfait, thank you for providing this. I wasn't able to make the last query work since I had such long values in the `INSERT` and `SELECT` queries that the syntax errors I was getting were difficult to debug. I did quite a bit of reading regarding temporary tables, and I plan to keep it in mind for the future. Your solution likely works but Gord's suggestions have me content. I've +1 you for your time/contribution. Thank you. – Mathomatic Jul 24 '16 at 22:53