5

In my app, users can follow other users, and get updates whenever the people they follow perform an activity.

I store the follow relationships in this manner:

class User(db.Model):
  ''' User details '''
  username = db.StringProperty()

class Contacts(db.Model):
    '''Store users contacts
       parent= User (follower)
       key_name= Users username (follower)
       contacts = A list of keys of Users that a User follows '''
    contacts = db.ListProperty(db.Key)
    last_updated = db.DateTimeProperty(auto_now=True)

Getting followers, and Users that a user follows (followers & following):

'''Get Users that my_user follows'''
my_user = User().all().fetch(1)
contacts = Contacts.get_by_key_name(my_user.username).contacts

''' get my_user followers - copied from an answer here on stackoverflow '''
follower_index = models.Contacts.all(keys_only=True).filter('contacts =',my_user)
follower_keys = [f.parent() for f in follower_index]
followers = db.get(follower_keys)

So, I want to order my_user followers by follow date (which I don't track in the above models), but I'm not sure what is the best way to do that. Here are the options I can think of:

1) Instead of the current structure for Contacts(db.Model), use a "bridge" model:

class Contacts(db.Model):
  follower = db.ReferenceProperty(User)
  following = db.ReferenceProperty(User)
  date_created = db.DateTimeProperty(auto_now_add=True)

However, I still have to figure out how to make sure that I have unique follower->following entities: follower=user1, following=user2 should not repeat. I can do that if I apply 2 filters to my query I think.

2) Keep the current model structure, but instead of having a list of keys in Contacts(db.Model), store a tuple: [user_key, date_created] as follows:

class Contacts(db.Model):
        '''Store users contacts
           parent= User (follower)
           key_name= Users username (follower)
           contacts = A list of Tuples: User.key(), date_created '''
        contacts = db.StringListProperty()
        last_updated = db.DateTimeProperty(auto_now=True)

However, this way i'll have to process the list of contacts: - I have to extract the User keys and date_created from each string in the StringList() - Then I can order the list of User keys by date created

3) Last solution (clearly not efficient): keep the original db structure, and store user follow activity in a separate Model - each follow action is stored separately with a date_created field. Use this table only to be able to order the list of user followers by date. This of course means that I'll do two datastore puts - one to Contacts() and another to FollowNewsFeed() as follows:

Class FollowNewsFeed(db.Model):
  ''' parent = a User follower'''
  following = db.ReferenceProperty(User)
  date_created = db.DateTimeProperty(auto_add_now=True)

Any insights on the best way to deal with this are highly appreciated :)

Thank you!

yasser
  • 165
  • 1
  • 7

1 Answers1

3

I would use a model to map from the user to their target rather then a list:

  1. Inserting a new instance or deleting an existing one will probably be faster than modifying a huge list and resaving it. Also as the size of followed grows you can query a subset of the list rather that fetching it all (see below for why).

  2. You get extra attribute space and don't have to worry as much about needing to redesign and fudge with lists down the road.

  3. Don't have to worry about index limits with lists (each item takes up a slot, up to 5000).

Unfortunately you will probably hit another limit much sooner:

A single query containing != or IN operators is limited to 30 sub-queries.

Which means each element will consume a slot [ex. in (1,2,3) = 3 slots]. So even at a relatively small amount (~30 followers) you will need to make multiple trips to the database and append results.

Assuming people don't want to go insane at their page taking hundreds of years to load and timing you will need some type of limit on how many people they can follow. At 100 people being followed you would need a good 4-5 trips and have to sort the data within your app or on the client side via javascript.

Matt
  • 1,222
  • 1
  • 9
  • 18
  • Thanks! Interesting - I haven't thought of the sub-queries limit. That's another problem that I'll have to take care of, especially that as you mentioned, I need to sort the set of results by date - so I need to have it all in one list before displaying it to the user.. – yasser Feb 11 '11 at 01:52
  • Personally I would use ajax to fetch the data. Break it up into trips of 20 contacts w/ updates from the last 7 days or 100 total or something then order and display the results via javascript. Just make sure you don't use ajax for the persons profile/message listing so google can index it if you want that. – Matt Feb 11 '11 at 03:51
  • which means: (1) get the list of user contacts (2) get the list of updates from these contacts by breaking the list into lists of 20 contacts (3) get updates from these contacts, filtering by contact name and date_created (last 7days) (4) if list of results < 10, get results from the next 20 contacts and append it to the updates list (5) if list still has < 10 updates, start over and expand the date interval (get updates for the past 14 days instead).. Instead of this mess, maybe I should store all updates in a user inbox to avoid sub-quries..Which means that i'll do lots of writes :-/ – yasser Feb 11 '11 at 11:05
  • The downside of that is that if someone has say 1000 followers as soon as they post something you will be storing the same data 1000 times. I don't think I ever managed to get more than ~200 objects inserted before it became an almost guaranteed timeout. Also inserting a crapton of data consumes CPU like mad. – Matt Feb 11 '11 at 20:18
  • I tried implementing this solution, looping through long lists of contacts, chunking each list in 25 items, loop through, if not enough updates, move date interval to get older updates..etc. However, it's really slow. Having an inbox model doesn't require storing the same piece of data 1000 times, if a user has 1000 followers. All it needs is 1 datastore entry with a ListProperty of "update receivers". Take a look at the implementation here: http://stackoverflow.com/questions/1630087/how-would-you-design-an-appengine-datastore-for-a-social-site-like-twitter – yasser Feb 18 '11 at 00:56