I am building a query that performs some filtering on rating data.
Suppose I have a simple table called ratings
like the following, storing data from an online rating tool:
+----------------+----------------+--------+ | page_title | timestamp | rating | +----------------+----------------+--------+ | Abc | 20110417092134 | 1 | | Abc | 20110418110831 | 2 | | Def | 20110417092205 | 3 | +----------------+----------------+--------+
I need to extract pages with a high frequency of low values in the latest 10 ratings, and limit this query to pages that produced a volume of at least 20 ratings in the preceding week. This is the ridiculously long query I came up with:
SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg,
(
SELECT COUNT(*) FROM
(
SELECT * FROM ratings a2 WHERE a2.page_title = a1.page_title
AND DATE(timestamp) <= '2011-04-24' ORDER BY timestamp DESC LIMIT 10
)
AS latest WHERE rating >=1 AND rating <=2 ORDER BY timestamp DESC
)
AS lowest FROM ratings a1
WHERE DATE(a1.timestamp) <= "2011-04-24" AND DATE(a1.timestamp) >= "2011-04-17"
GROUP BY a1.page_title HAVING COUNT(*) > 20
the top level query looks for pages with more than 20 ratings in the week terminating on 2011-04-24, the subquery is supposed to retrieve the number of ratings with values between [1,2] from the latest 10 ratings of each article from the top level query.
MySQL complains that a1.page_title in the WHERE clause of the subsubquery is an unknown column, I suspect this is because a1 is not defined as an alias in the second-level query, but only in the top-level query, but I am clueless how to fix this.
(edited)
I am adding as an explanation of my suspect above regarding cross-level referencing another query which works absolutely fine, note that here a1 is not defined in the subquery but it is in the immediate parent:
SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg,
(
SELECT COUNT(*) FROM ratings a2 WHERE DATE(timestamp) <= '2011-04-24'
AND DATE(timestamp) >= '2011-04-17' AND rating >=1
AND rating <=2 AND a2.page_title = a1.page_title
) AS lowest FROM ratings a1
WHERE DATE(a1.timestamp) <= '2011-04-17' AND DATE(a1.aa_timestamp) >= '2011-04-11'
GROUP BY a1.page_title HAVING COUNT(*) > 20