4

I have a news site that have a lot of topics. There might be millions of users following topics. I maintain a sortedset for each user to load news belonging to topics they are following. When an article is added or updated, I will write this article to affected users' lists. Specifically, pseudo code as follows:

if a article is added/updated
  get all topics that the article belong (each article may belong to many topics)
    for each topic: get all topic followers
      update_user_news_list(userId, articleId)

This is the java code with jedis:

static final int LIMIT_BATCH = 1000;
static void addToUserHomeFeed(int index, Jedis jd) {
        int range_limit = index + LIMIT_BATCH - 1;
        Set<String> list = jd.zrange("Follower:Topic:Id", index, range_limit); // get list of followers
        if (list.isEmpty())  return;
        Iterator<String> it = list.iterator();
        while (it.hasNext()) {
           // update user list
        }
        addToUserHomeFeed(range_limit + 1, jd);
}

The problem is, my site currently has nearly 1 million users, some popular topics followed by around 800000 users and sometimes the system produces "buffer overflow" errors. Am I doing something wrong or there are better approaches? I use redis 2.4

ipkiss
  • 13,311
  • 33
  • 88
  • 123

1 Answers1

0

Well, I am no Redis expert, but seems to me like storing the most recent article + the article's topic in a time based sorted set should do:

ZADD lasttopics -[timestamp] Topic:1-Article:1 -[timestamp] Topic:2-Article:1 

adds article 1 to the set, 1 time for each topic. Each key in the set is ranked by the negative of unix timestamp (so the later articles become the first ones in the set)

Now, when a user log on, first get all topics he follows and then iterate through the sorted set using "ZSCAN lasttopics 0" etc. until you have enough articles in topics that interest the user.

This way, you perform 1 write per topic each time article is inserted, and you can iterate long way back into articles history (done just for users interested only in very old topics, most users will find interesting articles on the first few 100s of articles) without overloading Redis.

For maintenace, you can use the time score to remove articles from the set as they get old.