0

This question has been asked a THOUSAND times... so it's not unfair if you decide to skip reading/answering it, but I still thought people would like to see and comment on my approach...

I'm building a site which requires an activity feed, like FourSquare.

But my site has this feature for the eye-candy's sake, and doesn't need the stuff to be saved forever.

So, I write the event_type and user_id to a MySQL table. Before writing new events to the table, I delete all the older, unnecessary rows (by counting the total number of rows, getting the event_id lesser than which everything is redundant, and deleting those rows). I prune the table, and write a new row every time an event happens. There's another user_text column which is NULL if there is no user-generated text...

In the front-end, I have jQuery that checks with a PHP file via GET every x seconds the user has the site open. The jQuery sends a request with the last update "id" it received. The <div> tags generated by my backend have the "id" attribute set as the MySQL row id. This way, I don't have to save the last_received_id in memory, though I guess there's absolutely no performance impact from storing one variable with a very small int value in memory...

I have a function that generates an "update text" depending on the event_type and user_id I pass it from the jQuery, and whether the user_text column is empty. The update text is passed back to jQuery, which appends the freshly received event <div> to the feed with some effects, while simultaneously getting rid of the "tail end" event <div> with an effect.

If I (more importantly, the client) want to, I can have an "event archive" table in my database (or a different one) that saves up all those redundant rows before deleting. This way, event information will be saved forever, while not impacting the performance of the live site...

I'm using CodeIgniter, so there's no question of repeated code anywhere. All the pertinent functions go into a LiveUpdates class in the library and model respectively.

I'm rather happy with the way I'm doing it because it solves the problem at hand while sticking to the KISS ideology... but still, can anyone please point me to some resources, that show a better way to do it? A Google search on this subject reveals too many articles/SO questions, and I would like to benefit from the experience any other developer that has already trawled through them and found out the best approach...

Aditya M P
  • 5,127
  • 7
  • 41
  • 72

1 Answers1

2

If you use proper indexes there's no reason you couldn't keep all the events in one table without affecting performance.

If you craft your polling correctly to return nothing when there is nothing new you can minimize the load each client has on the server. If you also look into push notification (the hybrid delayed-connection-closing method) this will further help you scale big successfully.

Finally, it is completely unnecessary to worry about variable storage in the client. This is premature optimization. The performance issues are going to be in the avalanche of connections to the web server from many users, and in the DB, tables without proper indexes.

About indexes: An index is "proper" when the most common query against a table can be performed with a seek and a minimal number of reads (like 1-5). In your case, this could be an incrementing id or a date (if it has enough precision). If you design it right, the operation to find the most recent update_id should be a single read. Then when your client submits its ajax request to see if there is updated content, first do a query to see if the value submitted (id or time) is less than the current value. If so, respond immediately with the new content via a second query. Keeping the "ping" action as lightweight as possible is your goal, even if this incurs a slightly greater cost for when there is new content.

Using a push would be far better, though, so please explore Comet.

If you don't know how many reads are going on with your queries then I encourage you to explore this aspect of the database so you can find it out and assess it properly.

Update: offering the idea of clients getting a "yes there's new content" answer and then actually requesting the content was perhaps not the best. Please see Why the Fat Pings Win for some very interesting related material.

ErikE
  • 48,881
  • 23
  • 151
  • 196
  • In your opinion, does the primary key update_id in the live_updates table amount to a "proper index"? – Aditya M P Jun 26 '11 at 17:39
  • thanks for reading through all that and taking time to answer, by the way :) – Aditya M P Jun 26 '11 at 17:39
  • Haha, I had to re-read that to make sure you didn't say : Why the fat pigs win! Yes, it's only 1 read in my code to find out if there are any new updates. And only one more read to send out new updates. I read about Comet, but this approach does not work with Apache "as this would quickly tie up all worker threads and leave your server unable to respond to any other request". And my client requires Apache so I can't use it this time - but I'll definitely learn more about it.. thanks a lot for suggesting something I never knew existed :) – Aditya M P Jul 02 '11 at 18:11