7

Let's take a simple FQL query to get all links shared by a user's friends since yesterday for example:

SELECT link_id, title, url, owner, created_time
FROM link
WHERE
    created_time > strtotime('yesterday') AND
    owner IN (
        SELECT uid2 FROM friend WHERE uid1 = me()
    )
LIMIT 100

If a user has 50 friends, this will execute perfectly. But if a user has hundreds of friends, more often than not Facebook returns an error.

Options:

  1. Limit the friend select query to 50 -- Sure, that will work, but it will show the same friends every time. Unless you want an Ivy League-only feed, this isn't very helpful.
  2. Batch Queries -- Create a batch of queries using offsets and limit each to 50. Unfortunately there's no improvement here either.
  3. Loop It -- So far this is the best I've found. Loop through the same queries you built for a batch query, but do it one at a time with multiple api fql query calls. But even this is hit and miss.

How can I query Facebook appropriately to ensure successful results?

Notes:

  • I'm using the latest Facebook php sdk, 3.1.1
  • I've also tried extending the default options for curl timeouts in the base_facebook.php

Common Errors associated with Timeout:

1.

Fatal error:  Uncaught Exception: 1: An unknown error occurred thrown in /..../facebook/php-sdk/src/base_facebook.php on line 708

line 708 is an exception error:

// results are returned, errors are thrown
if (is_array($result) && isset($result['error_code'])) { 
    throw new FacebookApiException($result);
}

2.

Fatal error: Uncaught CurlException: 52: SSL read: error:00000000:lib(0):func(0):reason(0), errno 104 thrown in /..../facebook/php-sdk/src/base_facebook.php on line 814
sakibmoon
  • 2,026
  • 3
  • 22
  • 32
Ryan
  • 14,682
  • 32
  • 106
  • 179

3 Answers3

2

You should loop through using limit/offset like you said, or cache the friends list up front as puffpio suggested.

You said that it still wasn't working reliably - this is because some users may have many, many links, while others not so many. Note also that you may be retrieving uncached data for some users. I would recommend having a single retry in your loop for failed queries - it's often the case that the first one will time out and the second one will succeed due to newly cached data.

Finally, for posterity, I'm opening a task to optimize the link table to do a better job of being efficient when it's being filtered by time.

Drew Hoskins
  • 4,168
  • 20
  • 23
  • Thanks Drew. I was able to move past the errors by commenting out the exceptions in base_facebook.php (708 and 814). With the exceptions out of the way I was able to loop through links for up to 500 (shuffled) users over the past week in chunks of 100. So far it's been consistent at 500. I'll see if I can find the ceiling another time. Thanks for your help. – Ryan Aug 31 '11 at 00:30
  • 1
    I just noticed that created_time isn't indexed. That probably explains the timeout issue. I thought about sorting by link_id as well, but it turns out link_id doesn't correspond to created_time either. It seems overkill to have to request all link_ids in order to efficiently pull only the last week. But until they index this column, it's the best we have. – Ryan Sep 03 '11 at 19:03
1

Some db engines do not optimize the IN keyword well, or at all. They may be executing the in clause for every single resulting row of your query. Can you join the link and friend tables instead of using an IN with a subquery?

You may find this article interesting. (Discusses issues with IN clause performance on MySQL and Facebook runs MySQL on the back end.)

Community
  • 1
  • 1
Paul Sasik
  • 79,492
  • 20
  • 149
  • 189
  • Interesting. All FQL queries are only to indexed columns, so I figured this wouldn't be an issue. I did try to put a flat list of imploded id's in there as well. But I saw no improvement. If I slice the flat id list I get the same results as above. The only benefit of running this query first is that I can sort or shuffle the ids so as to avoid the lower ids always having preference. – Ryan Aug 30 '11 at 19:55
1

It would be better to cache the user's friends and only refresh it occasionally. In other words, run this query

SELECT uid2
FROM friend
WHERE uid1 = me()

Cache the list of users and run

SELECT link_id, title, url, owner, created_time
FROM link
WHERE
    created_time > strtotime('yesterday') AND
    owner IN (/*your user list here*/)
LIMIT 100

This way you are not running the inner query all the time. In reality a user's friend list does not have a high churn rate, so you would not need to update it as frequently as getting the share links.

Additionally, architecting it this way will allow you to break up the 2nd query into multiple queries with different sets of 'owner's and then using fql.multiquery to get them all simultaneously

sakibmoon
  • 2,026
  • 3
  • 22
  • 32
puffpio
  • 3,402
  • 6
  • 36
  • 41
  • Doing it as an fql.multiquery is not going to help if he's running out of memory or timing out. – Drew Hoskins Aug 30 '11 at 20:29
  • And even when it would work, more often than not I'd get this error: "This API call could not be completed due to resource limits". Looping through shuffled, chunks of user ids seems to be the most consistent. – Ryan Aug 31 '11 at 00:24