1

I have a Spring Boot application where I batch insert hundreds of thousands of rows across two tables where I insert one row in one of the tables, return the id and use it to insert one row in the other table. As it is right now it's simply not fast enough. I would like to know what is the fastest way to do it. Here is my code:

private void insertData(DataList request, long personId, String tableName) {
    try {
        String sql = """
                WITH inserted AS
                (INSERT INTO user_%s (username, password, email, phone, person_id) VALUES (?,?,?,?,?) RETURNING user_id)
                INSERT INTO info_%s (user_id, info, full_text_search) VALUES ((SELECT user_id FROM inserted), ?, to_tsvector(?));
                """.formatted(tableName, tableName);

        jdbcTemplate.batchUpdate(sql, new BatchPreparedStatementSetter() {
            @Override
            public void setValues(@Nonnull PreparedStatement ps, int i) throws SQLException {
                Data data = request.getData(i);
                ps.setTimestamp(1, data.getUsername());
                ps.setLong(2, data.getPassword());
                ps.setInt(3, data.getEmail());
                ps.setInt(4, data.getPhone());
                ps.setString(5, data.getpersonId());
                ps.setString(6, data.getInfo());
                ps.setString(7, data.getInfo());
            }

            @Override
            public int getBatchSize() {
                return request.getDataCount();
            }
        });
    } catch (Exception e) {
        log.error(e.getMessage(), e);
    }
} 

I'm using PostgreSQL as my database.

TheStranger
  • 1,387
  • 1
  • 13
  • 35
  • Is there any chance to attach the data source as a foreign table? – Stefanov.sm Aug 23 '21 at 09:45
  • Which database are you using? – M. Deinum Aug 23 '21 at 09:45
  • @M.Deinum PostgreSQL with TimeScaleDB – TheStranger Aug 23 '21 at 10:26
  • And is it really the query that takes to long or is the whole process? As constructing that many objects in memory (the `DataList`) takes a lot of time. Also with postgres you need to explicitly enable batch rewrites for the query else you will still get single inserts. – M. Deinum Aug 23 '21 at 10:29
  • @M.Deinum I think it's the whole process, I'm not suggesting that it is only the query. Whatever will help make the whole process faster would be awsome.. How do I explicitly enable batch rewrites? – TheStranger Aug 23 '21 at 10:33
  • To enable it you need to append a parameter, see https://stackoverflow.com/questions/47664889/jdbc-batch-operations-understanding/48349524#48349524. I also doubt that it will actually make a difference, as I dount the query can be batched due to how it is written with the select. Another thing is that the `to_tsvector` will obviously also take time. About the rest of your proces, no idea as that isn't part of the question. – M. Deinum Aug 23 '21 at 10:37
  • Where do the data come from? Is it absolutely necessary to make hundreds of thousands of transactions and roundtrips to/from the database? If not then why not let the database handle the job itself? – Stefanov.sm Aug 23 '21 at 10:49
  • @Stefanov.sm the data comes from a gRPC client. – TheStranger Aug 23 '21 at 11:11
  • Is it possible to cache all of it into a file and store it temporarily in a place that is "visible" to Postgres – Stefanov.sm Aug 23 '21 at 12:29

0 Answers0