1

The problem is somewhat similar to twitter/facebook's:

  • followers and following
  • users add items

Subsequently you see the items added by all the people you are following.

Problem A: how to keep the query for items added by people you are following working well with growing datasets?

Problem B: we are seeing geographically disperse traffic. large userbase in the netherlands and brazil. any solution would probably need to allow for databases across multiple data centers.

We are running on a django/python stack. Already running edge server caching. (Anonymous users get the cached version, logged in user's version is run through a second level template parsing service first)

Thierry
  • 3,225
  • 1
  • 26
  • 26

1 Answers1

1

Problem A: how to keep the query for items added by people you are following working well with growing datasets?

starting with a dataset of (who are my followers / who am i following); one could save these values as tuples and segmentate them across several SQL databases (though I doubt real segmentation is really needed even for twitter size databases). This would give the list of people who are followed. Secondly, a table for follower->items, sorted by follower could be easily queried; and also segmentated if needed given humongous datasets.

Problem B: we are seeing geographically disperse traffic. large userbase in the netherlands and brazil. any solution would probably need to allow for databases across multiple data centers.

one could designate a master database (cluster) and a slave databse (cluster), and replicate data from the master to the slave. However, this does imply the data is always saved to the master database. data queries can be done locally.

Another option is to run the database (clusters) in a master-master setup; but this is generally more trouble then it is worth.

user542164
  • 96
  • 1