Optimal and memory safe way to retrieve millions of records out of mssql DB from Java?

Question

I have a table with about 6 million records in it. I need to retrieve the ID of each row to process in Java. The ID is an alphanumeric nvarchar(8). I would like to do this a memory safe and efficient manner.

Below is my approach, in it I: select the top 1000 records. Process those. Select the the next 1000 records where the ID is greater than the greatest ID in my last set of records. Continue until I retrieve a set with less than 1000 records.

I believe my strategy will retrieve all the records in a memory safe fashion, but have no idea if there is a faster/better strategy. (Also not positive that the strategy will retrieve all records, I need to test in an environment with more resources, mine is very unstable).

List<String> ids = myDao.getTopThousandIds();
processIds(ids);

while(skus.size() == 1000) {
  ids = myDao.getTopThousandIdsGreaterThan(ids.get(ids.size() -1));
  processIds(skus);
}

public interface MyDao {
  @SqlQuery("SELECT TOP 1000 [ID] FROM [tbl] WITH(NOLOCK) ORDER BY [ID] ASC")
  List<String> getTopThousandIds();

  @SqlQuery("SELECT TOP 1000 [ID] FROM [tbl] WITH(NOLOCK) WHERE [ID] > :id ORDER BY [ID] ASC")
  List<String> getTopThousandIdsGreaterThan(@Bind("id") String id);
}

If you anyways need to fetch *all* ids from the table, I don't see any benefit in using `TOP 1000`. Just `SELECT [ID] FROM [tbl] WITH(NOLOCK) ORDER BY [ID] ASC`. — Mick Mnemonic, Sep 02 '15 at 23:37
Is slicing by 1000 rows crucial for your application business logic? Or it's just a way to reduce memory usage? — Lyubomyr Shaydariv, Sep 02 '15 at 23:38
@NeilMcGuigan Any idea if jdbc cursor is supported by the skife jdbi library? I can't find anything about it. — ab11, Sep 02 '15 at 23:44
Did you experiment with setting the [fetch size](http://jdbi.org/apidocs/org/skife/jdbi/v2/Query.html#setFetchSize%28int%29) instead of explicitly paging with `TOP`? — Mick Mnemonic, Sep 02 '15 at 23:49
@ab11 if I understand your case right, you probably better not to return a ready to use `List`, but let a DAO method accept a consumer like `Consumer` that's invoked from a DAO implementation for each result set row. Say, `void getAllIds(Consumer consumer)` rather than `List getAllIds()`. I'm not sure if such an idea fits the library you use, though, but it is able to let you process very large result sets not getting all of their rows into memory, so you could collect such a "stream" into a list (if possible), or "redirect" it to any output just row-by-row. — Lyubomyr Shaydariv, Sep 02 '15 at 23:52
@MickMnemonic I didn't. I would expect it to behave the same. But I should time each, as it could be an optimization. — ab11, Sep 02 '15 at 23:55

Optimal and memory safe way to retrieve millions of records out of mssql DB from Java?

0 Answers0