SQL Server : optimize query. Lots of data

Question

At the beginning I apologize for not being word-perfect in English.

I have two tables in my database, one contains questions, and second contains user answers for questions (for statistics).

TableA - questions

 ___________
| ID | Name |

TableB - Statistics

 ___________________________________
| ID | A_ID | U_ID| IsCorrect | Date|

User can answer one question several times, for example if we have question with ID = 1 and user (with id 2) which answered this question 4 times, we will add 4 rows to TableB:

 ___________________________________
| ID | A_ID | U_ID| IsCorrect | Date|
-------------------------------------
| 1  |  1   |  2  |   True    | Date|
| 2  |  1   |  2  |   False   | Date|
| 3  |  1   |  2  |   False   | Date|
| 4  |  1   |  2  |   True    | Date|

At the end, I have to query for questions (TableA) which user has not responded or answered but the smallest number of times (user is able to answer all questions).

My query (procedure) looks like:

Declare @max int

SET @max = (SELECT TOP 1 Count(A_ID) as QuestionCount  FROM [TableB]
Where User_id = 1
GROUP BY A_ID
ORDER BY QuestionCount DESC)

SELECT TOP 40 ID 
FROM [dbo].[TableA]
WHERE ID NOT IN (SELECT A_ID 
                 FROM [dbo].[TableB]
                 WHERE User_id = 1
                 GROUP BY A_ID
                 HAVING Count(A_ID) = @max)
ORDER BY NewID()

At the beggining I query for max occurence of question - If user answered some question 4 time @max will be 4.

In second query I query for question which weren't answered yet (in this occurence).

Question is: How to optimize this query (or maybe I should change my tables)? TableB for now has almost one million rows and beacause of that it isn't fast enough.

"almost one million rows" is not even close to "Big Data" – Martin Smith Jul 10 '15 at 08:35 — Martin Smith, Jul 10 '15 at 08:35
Good point, I changed title – mateusz-dot Jul 10 '15 at 08:40 — mateusz-dot, Jul 10 '15 at 08:40

score 0 · Accepted Answer · answered Jul 10 '15 at 08:37

0

With SQL-Server (>=2008) you can use the OVER clause (https://msdn.microsoft.com/en-us/library/ms189461.aspx) which gives you grouped aggregats.

EDIT: Just found your ORDER BY NewID() Why do you do this? NewID() is very bad to sort... 1 million is not so much in fact, but 1 million GUIDs without an index are a mass...

answered Jul 10 '15 at 08:37

Shnugo

66,100
9
53
114

I used `ORDER BY NewID()` to get random data. Is there any way to get random rows? I found something like `TABLESAMPLE` is it better? – mateusz-dot Jul 10 '15 at 08:45
With TABLESAMPLE you have to be aware of "The sample does not have to be a truly random sample at the level of individual rows." Don't know how important this is to you... If you need exactly 40 Rows, you could fill a table value variable with 40 random **existing** IDs and use this as INNER JOIN (use an index!) – Shnugo Jul 10 '15 at 08:52
Ok, so `TABLESAMPLE` is not an option. I understand, that your solution isn't so random? Question will be sorted like the values in helper table ? What do you thing about using `(ABS(CAST((BINARY_CHECKSUM(*) * RAND()) as int)) % 100) < 10` in `WHERE` statement? – mateusz-dot Jul 10 '15 at 09:11
If you create a million of random values "on the fly" you will get a "real" random TOP 40. But sorting a million values without an index cannot be fast... What about a IDENTITY column (running number) within your big table and an INNER JOIN to this indexed column, Assuming 40 random values within the existing range in random order this should be rather random, shouldn't it? You could create some more and still use TOP 40 to avoid missings if IDs are not existing... (due to deletes...) – Shnugo Jul 10 '15 at 09:19
Ok, you are right, now i get it. But I have to get random values from TableA (quesitons) and for now this table has about 3000 rows. TableB has more than one million. Thank you for your solution, I really like it so for sure I will implement it – mateusz-dot Jul 10 '15 at 09:24
Glad to help you! Happy coding! – Shnugo Jul 10 '15 at 09:28

SQL Server : optimize query. Lots of data

1 Answers1