3

I have table with 20+ millions of rows and I can't select all rows using single query because of OutOfMemoryError. I read about fetchSize attribute and looks like it might help to resolve my issue because it is common advise

But I have question about how to apply it.

I have following code:

private final JdbcTemplate jdbcTemplate;
...
    jdbcTemplate.setFetchSize(1000);
    List<MyTable> myList= this.jdbcTemplate.query(
                "SELECT * FROM my_table",
                new Object[]{},
                MyTableMapper.INSTANCE
        );
    mylist.foreach(obj->processAndSave(obj));

Looks like jdbc driver will select 1000 per request. But what should I do to proceess all 20+ millions rows ?

Should I invoke jdbcTemplate.query several times ?

Youcef LAIDANI
  • 55,661
  • 15
  • 90
  • 140
gstackoverflow
  • 36,709
  • 117
  • 359
  • 710
  • Why not basing on pagination? – Youcef LAIDANI Mar 22 '19 at 19:52
  • @YCF_L I thought that fetch size does smth like this. Is it false? – gstackoverflow Mar 22 '19 at 19:53
  • 2
    Collecting everything to a `List` will throw `OutOfMemoryError` regardless of fetch size set – Denis Zavedeev Mar 22 '19 at 19:57
  • hmm not exactly that, but you need to use limit and offset in your query, if you read the [doc about setFetchSize](https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/jdbc/core/JdbcTemplate.html#setFetchSize-int-) it doesn't work like you think – Youcef LAIDANI Mar 22 '19 at 19:58
  • the question here, why you want to load 20M+ row in your code? – Youcef LAIDANI Mar 22 '19 at 19:58
  • 2
    What you're doing is: execute this query and store all the results in a List in meory. Setting the fetch size won't change anything. The fetch size is useful to control how many rows are loaded at once when iterating through a ResultSet: insted of doing a network trip every time you ask for the next row in the result set, you can ask the driver to load and buffer, let's say, 100 rows in memory. When you are at the 101th row, a new network trip is made to load and buffer the 100 next rows. – JB Nizet Mar 22 '19 at 20:07
  • 1
    If you want to process a very large number of rows, you need to make sure your driver doesn't load everything in memory (MySQL does that, and PostgreSQL does that by default, too). And thenyou need to iterator through the ResultSet, without trying to transform the whole resultset to a List. – JB Nizet Mar 22 '19 at 20:08
  • Which RDBMS/driver are you using? – Mick Mnemonic Mar 22 '19 at 20:43
  • @JB Nizet, I use Postgres – gstackoverflow Mar 22 '19 at 21:53
  • @ YCF_L, what do yout think about https://stackoverflow.com/a/49298569/2674303 ? – gstackoverflow Mar 22 '19 at 22:11
  • @caco3 I don't need all rows in List but I need process each row – gstackoverflow Mar 22 '19 at 22:49
  • @JBNizet Please provide example – gstackoverflow Mar 22 '19 at 22:51
  • https://jdbc.postgresql.org/documentation/head/query.html – JB Nizet Mar 22 '19 at 23:01
  • @JB Nizet But I use spring JdbcTemplate in the project.So can I use org.springframework.jdbc.core.JdbcTemplate#queryForRowSet(java.lang.String) and iterate over RowSet instead of ResultSet ? – gstackoverflow Mar 23 '19 at 09:37
  • You probably want to use a RowCallbackHandler: https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/jdbc/core/JdbcTemplate.html#query-java.lang.String-java.lang.Object:A-int:A-org.springframework.jdbc.core.RowCallbackHandler- – JB Nizet Mar 23 '19 at 09:43
  • @JB Nizet, so I have to do 2 things: 1 - execute row processing inside RowCallbackHandler 2-set fetch size on jdbc template (for example 10_000) – gstackoverflow Mar 23 '19 at 09:53
  • @JB Nizet is it truth? – gstackoverflow Mar 23 '19 at 13:56
  • Yes. But why don't you just test it? – JB Nizet Mar 23 '19 at 13:56
  • @JB Nizet thanks. I will do it. Just wanted to get your response – gstackoverflow Mar 23 '19 at 13:57
  • @JB Nizet I have a quiestion how to chosse correct fetch size. – gstackoverflow Mar 23 '19 at 14:00
  • @gstackoverflow if you don't ask it, I can't possibly answer it (and no one can) – JB Nizet Mar 23 '19 at 14:10

1 Answers1

0

You can Getting results based on a cursor

If you are using PostgreSQL 11 requiments:

  1. setFetchSize > 0
  2. make sure autocommit is off

For you example for standard configurations

spring.datasource.hikari.auto-commit=false
spring.jdbc.template.fetch-size=50

Also if you want process the result ouside of a repository class you can combine with queryForStream

Stream<MyTable> stream = jdbcTemplate.queryForStream(SQL,  MyTableMapper.INSTANCE);

PostgreSQL Documentation: Issuing a Query and Processing the Result

https://jdbc.postgresql.org/documentation/head/query.html