0

I have an api endpoint that returns X amount of random posts from a table called "posts". This endpoint is used by a mobile app. It retreives random posts by doing a SELECT COUNT(*) on the posts table and returning the amount of posts in the table. It then enters a for loop in which, at the start of each loop, a random number from 0 to the COUNT(*) is generated. A random post is then obtained using the handy OFFSET. This for loop goes until X amount of random posts obtained.

pseudocode:

FUNCTION getRandPosts :

$numOfPosts = queryExecute("SELECT COUNT(*) from posts");

for (iterations < numOfRandomPostsNeeded) {
    $randomNumber = rand(0, $numOfPosts)

    $randomPost = queryExecute("SELECT * from posts LIMIT 1, OFFSET $randomNumber")


}

Now, for each call to a getRandPosts function I want them to always retrieve a unique post that wasn't retrieved before. For this current getRandPosts call AND FOR PASTS CALLS.

I've thought of several ways of going about doing this:

  1. I could store the ID's of already seen posts in an array IN the app and then send this array to the getRandPosts function. The getRandPosts function then uses a "NOT IN" clause.

    FUNCTION getRandPosts(ARRAY $idsOfPostsAlreadySeen) :
    ...
    for ...
    
    $randomPost = queryExecute("
        SELECT * from posts
        WHERE id NOT IN $idsOfPostsAlreadySeen
        LIMIT 1, OFFSET $randomNumber")
    
    $idsOfPostsAlreadySeen.addToArray($randomPost.id)
    ...
    
  2. After each random post is retrieved, save that post id to a table called "seenPosts" where there is also a column called idOfApp which is used to distinguish users.

    FUNCTION getRandPosts(STRING appInstallID) :
    ...
    for ...
    
    
    $randomPost = queryExecute("
        SELECT posts.* FROM posts
        JOIN seenPosts.postID FROM seenPosts
        ON posts.id = seenPosts.postID
        WHERE seenPosts.postID = null AND seenPosts.appInstallID = $appInstallID
        LIMIT 1, OFFSET $randomNumber")
    
    queryExecute("INSERT INTO seenPosts SET postID = $randomPost.id, appInstallID = $appInstallID")
    ...
    

I'd also like to ask about the idea of, for idea number one, creating a temporary table from the array and using JOIN with that temp table vs using the array.

Consider the idea that, at most, 600 posts will have to be excluded for any given user (so an array of 600 ids in the case of the first idea).

Adé
  • 475
  • 5
  • 15
  • Why wouldn't you just use `ORDER BY RAND() LIMIT $numOfRandomPostsNeeded`? You would get the number of random posts needed in one query without duplicates. If you queried for one random row 600 times you would run into performance issues pretty quickly. – Jonathan Kuhn May 14 '15 at 15:42
  • I read that using RAND() is very inefficient. Or maybe I'm being too paranoid and prematurely optimizing? – Adé May 14 '15 at 15:45
  • premature. But I would bet that using RAND once would be about 599 times better than 600 single queries. – Jonathan Kuhn May 14 '15 at 15:45
  • You would have a greater performance hit from running multiple queries. Each query needs to go to the database, run against the database and return the data. Constantly hitting the database for one row, while increasing the complication of said query by adding id after id would be less ideal than doing one query for data. – Jonathan Kuhn May 14 '15 at 15:48
  • Maybe I should also point out that the IDs are not consecutive. There may be deletes of posts resulting in ID 278, 279, 281...skips like that. I think I read that that can lead to RAND() running slow. – Adé May 14 '15 at 15:50

0 Answers0