I'm currently working on an app that matches users based on answered questions. I realized my algorithm in normal RoR and ActiveRecord queries but it's waaay to slow to use it. To match one user with 100 other users takes
Completed 200 OK in 17741ms (Views: 106.1ms | ActiveRecord: 1078.6ms)
on my local machine. But still... I now want to realize this in raw SQL in order to gain some more performance. But I'm really having trouble getting my head around SQL queries inside of SQL queries and stuff like this plus calculations etc. My head is about to explode and I don't even know where to start.
Here's my algorithm:
def match(user)
@a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100
@b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100
if self.common_questions(user) == []
0.to_f
else
match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count)
if match <= 0
0.to_f
else
match
end
end
end
def possible_score(user)
i = 0
self.user_questions.select("question_id, importance").find_each do |n|
if user.user_questions.select(:id).find_by_question_id(n.question_id)
i += Importance.find_by_id(n.importance).value
end
end
return i
end
def actual_score(user)
i = 0
self.user_questions.select("question_id, importance").includes(:accepted_answers).find_each do |n|
@user_answer = user.user_questions.select("answer_id").find_by_question_id(n.question_id)
unless @user_answer == nil
if n.accepted_answers.select(:answer_id).find_by_answer_id(@user_answer.answer_id)
i += Importance.find_by_id(n.importance).value
end
end
end
return i
end
So basically a user answers a questions, picks what answers he accepts and how important that question is to him. The algorithm then checks what questions 2 users have in common, if user1 gave an answer user2 accepts, if yes then the importance user2 gave for each question is added which makes up the score user1 made. Also the other way around for user2. Divided by the possible score gives the percentage and both percentages applied to the geometric mean gives me one total match percentage for both users. Fairly complicated I know. Tell if I didn't explain it good enough. I just hope I can express this in raw SQL. Performance is everything in this.
Here are my database tables:
CREATE TABLE "users" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "username" varchar(255) DEFAULT '' NOT NULL); (left some unimportant stuff out, it's all there in the databse dump i uploaded)
CREATE TABLE "user_questions" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_id" integer, "question_id" integer, "answer_id" integer(255), "importance" integer, "explanation" text, "private" boolean DEFAULT 'f', "created_at" datetime);
CREATE TABLE "accepted_answers" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_question_id" integer, "answer_id" integer);
I guess the top of the SQL query has to look something like this?
SELECT u1.id AS user1, u2.id AS user2, COALESCE(SQRT( (100.0*actual_score/possible_score) * (100.0*actual_score/possible_score) ), 0) AS match
FROM
But since I'm not an SQL master and can only do the usual stuff my head is about to explode. I hope someone can help me figure this out. Or atleast improve my performance somehow! Thanks so much!
EDIT:
So based on Wizard's answer I've managed to get a nice SQL statement for "possible_score"
SELECT SUM(value) AS sum_id
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 101
WHERE uq1.user_id = 1
I've tried to get the "actual_score" with this but it didn't work. My database manager crashed when I executed this.
SELECT SUM(imp.value) AS sum_id
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 101
INNER JOIN accepted_answers as ON as.user_question_id = uq1.id AND as.answer_id = uq2.answer_id
WHERE uq1.user_id = 1
EDIT2
Okay I'm an idiot! I can't use "as" as an alias of course. Changed it to aa and it worked! W00T!