Retrieve large results ~ 1 billion using Typesafe Slick

Question

I am working on a cron job which needs to query Postgres on a daily basis. The table is huge ~ trillion records. On an average I would expect to retrieve about a billion records per execution. I couldn't find any documentation on using cursors or pagination for Slick 2.1.0 An easy approach I can think of is, get the count first and loop through using drop and take. Is there a better and efficient way to do this?

Are you (1) relying on a huge amount of RAM on the client, or (2) planning to use cursors to fetch less than the whole result-set? I'm not sure I understand your question in the specifics. — Richard Huxton, Oct 16 '14 at 08:18
Apologies if I didn't explain it properly. I am definitely looking for an option like (2) — Shashi, Oct 16 '14 at 17:30
I think what Richard Huxton was getting at is that if you use cursors, you won't pull all of those records into memory at once. You will read your billion results into memory one at a time (or perhaps a chunk at a time, depending on the database driver). — AmigoNico, Oct 21 '14 at 07:25
Ah, that helped. You might be interested in this question: http://stackoverflow.com/questions/14369899/scala-slick-lazy-fetch — AmigoNico, Oct 21 '14 at 23:29
I haven't actually tried Slick yet, but I would have thought that it would give you an Iterator that uses a cursor underneath. Ah, this may be helpful: http://stackoverflow.com/questions/19409646/database-iterators-in-conjunction-with-paging — AmigoNico, Oct 21 '14 at 23:32
thanks @AmigoNico both those links recommend what I have implemented right now. Get a total count and paginate using drop and take. — Shashi, Oct 22 '14 at 15:45

score -1 · Answer 1 · answered Oct 16 '14 at 16:28

-1

Map reduce using akka, postgresql-async, first count then distribute with offset+limit query to actors then map the data when needed then reduce the resul to elasticsearch or other store?

answered Oct 16 '14 at 16:28

user4150615

1

1

This answer looks more like a comment. Try to format your thens into steps and give a little more detail. – CaptainBli Oct 16 '14 at 16:34

Retrieve large results ~ 1 billion using Typesafe Slick

1 Answers1