I have to read 3 TB of production data from a Cassandra database.
I have implemented paging using java driver but this technique uses offset value which means I am tracing my data all over again to reach a particular row and this process is using heap memory which is not a good practice. I want to read data without using lots of heap memory
Typically I want to fetch 10000 rows in a batch and then again read next 10000 without reading the first ten thousand reads again
I don't need high read latency my only problem is reading data without consuming lots of heap memory...
here is my code in part
Statement select = QueryBuilder.select().all().from("demo", "emp");
and this is how i am paging
List<Row> secondPageRows = cassandraPaging.fetchRowsWithPage(select, 100001, 25000);
printUser(secondPageRows);
Where 100001 is the start value from where I want to output my row and 25000 is the size of the page. so here I have to first reach till 100000 and then I will print the 100001st value. this is causing me the heap problem plus in my case, I don't want to reach at the end of one page to get the first record for another page.