1

Let's say I have this column family called People which has tens of thousands of rows, each with two columns: name and country.

Now let's say I want to query for all people living in China and I want the results to be sorted alphabetically on names.

The obvious approach would be to get all rows with the country China using secondary indices, and then sort the returned rows on client side. However if there are many people living in China, then this approach won't be feasible.

Also, I want to paginate the rows. Again, if I simply sort all rows on client side, then pagination is trivial. But what if getting that many rows and sorting them are too expensive?

What's the best way to do this?

Derek Chiang
  • 3,330
  • 6
  • 27
  • 34
  • You may find these links useful http://stackoverflow.com/questions/16013536/pagination-in-cassandra-based-web-application/16038799#16038799 and http://stackoverflow.com/questions/16660795/querying-large-datasets-in-cassandra/16662956#16662956 – Easility Aug 08 '13 at 05:12
  • @Derek - What do you mean by, 'column family called People which has tens of thousands of rows' ? Did you mean, you have a super_column 'people' and sub columns are {name: country}? Correct? Also, sorting almost always needs to be done on client side unless criteria is time. I have recently implemented pagination using Cassandra, pycassa in python. Let me know if you were able to solve it. If not, I can suggest you couple of good (not sure, If I did the best way) way. It works really good atleast. – NullException Dec 18 '13 at 22:01

0 Answers0