1

I came across this issue, and I just can't figure out what to do, so say i'm keeping these user actions:

  • Likes
  • Comments
  • Shares
  • Uploads

And so on, list goes to around 20 actions, now the best strategy i came up with is to create a single CF let's call it user_actions and then use composite rows (i think that's how they are called).

So rows would consist of user_id:action, now i think some people would ask why not just store all user actions in a single row? Well here is my biggest problem, because i want user to have an option to choose user wants to see when say he want's to check what his friends or he him self did in the past.

So say user wants to see what his friend liked, all i need to do is get that row with all of those likes, simple right?

But what if user wants to see everything (which is the default option), in that case i would need to make ~20 queries, well i guess that would be okay with a little traffic, but what if i have 100k reads each second that would mean 100k * 20, and it sounds horrible...

But i just can't see any other way, because if i would store everything in a single row how would i query individual actions when cassandra doesn't support WHERE?

By the way i'm using php and phpcassa lybrary.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Linas
  • 4,380
  • 17
  • 69
  • 117
  • You can query colums by a name range (ie: BA... to BZ...) and get a row slice. – lstern Nov 09 '12 at 18:05
  • Also, you can get multiple keys in one single request. – lstern Nov 09 '12 at 18:07
  • @lstern but wouldn't it be the same thing? i mean i would still need to make 20 slices for each action – Linas Nov 09 '12 at 18:07
  • @lstern Oh my god you are right, how could i possibly forget about multiple keys... But still i don't know much about them, i mean i understand that it would be quicker, but how much? Would it be really right way to do it? – Linas Nov 09 '12 at 18:09
  • you have to wonder that you may need to paginate data (20 lastest likes, etc) – lstern Nov 09 '12 at 18:14
  • I would try one key per action and column names starting by some integer that can order the events. Also I would serialize the event data to fit in a single column – lstern Nov 09 '12 at 18:17
  • @lstern yes pagination probably would create few issues, but i think i could use column slices and select say 20 last events for each action and then work it all out in server what i want to display and what i do not – Linas Nov 09 '12 at 18:23
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/19350/discussion-between-lstern-and-linas) – lstern Nov 09 '12 at 18:24

1 Answers1

1

You will eventually need to paginate the action info.

Also you want the capacity to paginate data ordered by event date and also filter what actions types the user wants to see. I suggest the folowing:

  • One row per action type.
  • Key is userId + actionType
  • Column names are [dateinteger + EventId]
  • Column value are the event object serialized to a single string

You can query data using a list of userId + actionType corresponding to the user selected actions. And slice the column names to paginate the results or filter by a date range.

I think this approach is better than using a single row to all user actions because you can easily order your records by date and also select what action types you will query. Using a single row you would have to choose between ordering your records by action type or by date.

Also, this is better (IMO) than having a row for each action event as you would need to create secondary indexes to proper query the data.

lstern
  • 1,599
  • 14
  • 27
  • yes i think this is the right approach, just as i mentioned in the chat just now, it would be a little bit difficult to make right column slices for pagination, but i think i can figure it out from here :) – Linas Nov 09 '12 at 18:48