8

I'm trying to design a system similar to Twitter's timeline, but I can't wrap my head around how to get updates from so many followers while remaining efficient. Let's say I'm following 1000 people on Twitter. When I go to my feed, how does it know which tweets to show me? This is what I'm thinking, but it seems extremely inefficient and unlikely:

You have 10,000 friends.
In a for loop, loop through each friend, getting their latest 
  status updates since their last update. 

But that just seems ridiculous to loop through 10,000 friends. I can't imagine how else they'd do it though. Or would it be something like:

Someone I am following posted a tweet. That tweet is inserted in 
  an array containing the tweets of all people I am following.

But then that would seem weird, if I followed someone new who has 20,000 tweets, then 20,000 tweets would be inserted in my array, and if that person has millions of followers, then there are a million X 20,000 copies of the same set of tweets. So that also seems unlikely.

Anyone have any ideas how they could possibly do it?

Snowman
  • 31,411
  • 46
  • 180
  • 303
  • I don't know how it actually works, but I can guess:First of all, work with id. each tweet get unique id and saved into database. each follower has array with id of tweets (could be also in DataBase). – LeeNeverGup Sep 18 '12 at 20:13
  • @LeeNeverGup but if 100,000 people are following me and I post a new tweet, I'd have to loop through 100,000 people and insert the tweet id in their array? Isn't that crazy? Or is that normal? – Snowman Sep 18 '12 at 20:14
  • 1
    You should read the article http://engineering.twitter.com/2012/07/caching-with-twemcache.html which emphasizes the use of cache to achieve performance. Even twitter cannot serve everything from disk. That's why caching is important if you want to scale and achieve performance. – Karan Ashar Sep 18 '12 at 20:21
  • If you are having problems with implementing the caching, you could also try an angle facebook went on for a while. What they did was show you posts only of friends that you interact with most, that way you can limit the number of people you see their tweets, and not reach a stage where you need to show thousands of tweets like you stated – Yarneo Sep 18 '12 at 20:27
  • 2
    according to [wikipedia](http://en.wikipedia.org/wiki/Twitter#Implementation) i wasn't too far: "Individual tweets are registered under unique IDs using software called snowflake and geolocation data is added using 'Rockdove'... The tweets are stored in a MySQL database ... and acknowledged to users as having having been sent... The process itself is managed by FlockDB and takes an average of 350 ms." Look for their [source](http://www.zdnet.com/how-twitter-tweets-your-tweets-with-open-source-7000003526/) – LeeNeverGup Sep 18 '12 at 20:27
  • @LeeNeverGup nice article. Though it skips to mention the details of the most important part :( – Snowman Sep 18 '12 at 20:38
  • The http://highscalability.com/blog/2012/2/13/tumblr-architecture-15-billion-page-views-a-month-and-harder.html article has a section titled "Cell Design For Dashboard Inbox" which (briefly) describes some pros & cons of two solutions for your problem. – Eugen Constantin Dinca Sep 24 '12 at 18:11

1 Answers1

3

I advice you to check the twissandra project they have implemented all the basic functionality of twitter using cassandra , a nosql database. It is said twitter is no longer using it for tweets .

The old implementation can be consulted here

jdcaballerov
  • 1,452
  • 1
  • 12
  • 16