1

I am writing a chat server and, want to store my messages in cassandra. Because I need range queries and I know that I will expect 100 messages/day and maintain history for 6 months I will have 18000 messages for a user at a point.

Now, since I'll do range queries I need my data to be on the same machine. Either I have to use ByteOrderPartitioner, which I don't understand fully, or I can store all the message for a user on the same row.

create table users_conversations(jid1 bigint, jid2 bigint, archiveid timeuuid, stanza text, primary key((jid1, jid2), archiveid)) with CLUSTERING ORDER BY (archiveid DESC );

So I'll have 18000 columns. Do you think I'll have performance problems using this cluster key approach?

If yes, what alternative do I have?

Thanks

adragomir
  • 457
  • 4
  • 16
  • 33

1 Answers1

2

Do not use the ByteOrderedPartitioner. I cannot stress enough how important that point is.

since I'll do range queries I need my data to be on the same machine.

With your PRIMARY KEY defined like this:

primary key((jid1, jid2), archiveid)

Your current partitioning keys (jid1 and jid2) will be combined and hashed so that all messages for specific values of jid1 and jid2 are stored together on the same partition. The drawback is that you will need both jid1 and jid2 for each query. But they will be sorted on archiveid, you will be able to query by range on archiveid, and it should perform well as long as you don't hit the 2 billion columns per partition limit.

Community
  • 1
  • 1
Aaron
  • 55,518
  • 11
  • 116
  • 132
  • Also I intend to use pagination so I wont let the user query all history, but by using pagination of 50 or maximum 100 messages/page. I'm a bit concerned about the very wide row anti pattern, what was the biggest wide row you dealt with Bryce? Many thanks – adragomir Mar 15 '15 at 15:37
  • @user3030447 I have a few tables with about 8k-12k columns per wide row. I may repartition them in the future, but for now they're doing well. – Aaron Mar 15 '15 at 16:18