9

I do pagination of large result sets with Cassanda 2.2 using the java client and PagingState like described here: https://datastax.github.io/java-driver/2.2.0-rc2/features/paging/

That works pretty well but i can not find any informations how Cassandra behaves when, while paging through the results, new records are inserted (or existing ones are updated). Are such new or changed records included in the result or is the result set immutable?

The use case is a stateless web service where a client can query large result sets.

EDIT: Same question for ResultSet paging in general (Cassandra does automatic lazy fetch here)

EDIT2: To my knowledge Cassandra supports no ACID but AID transactions, so i would expect a kind of isolation here when going through the resultset

Community
  • 1
  • 1
salyh
  • 2,095
  • 1
  • 17
  • 31

1 Answers1

6

There is no such isolation, as it would be too expensive to implement. The whole result set is not kept in memory, and the rows to be returned in the next page are not known when the current one is shipped to the client.

One interesting consequence of this is that it breaks the BATCH update guarantee, stated in the documentation as:

All updates in a @BATCH@ belonging to a given partition key are performed in isolation.

There's one open issue about this.

There are also some performance implications, because a lot of the work done to fetch page n has to be done again to fetch page n + 1 (such as opening and reading from index files and data files). Scylla, a drop-in replacement for Cassandra to which I contribute, is working on fixing this.

Duarte Nunes
  • 852
  • 7
  • 16
  • 1
    @salyh So what approaches did you try for pagination ? If an insert happens between two consecutive page requests, the page returns incorrect data, is there a workaround for this, or should it be a optimistic paging ? – vin Aug 31 '18 at 06:50