I'm working on a sales ranking widget for our site, which will display products ordered by their current position in a "top charts" or "best seller" list (so to speak).
After some reading, it looks like a good method of implementing this would simply be a rolling sales average algorithm where the more recent the sale, the higher it is weighted.
Example:
$rolling_avg = ((4*$d1)+(3*$d2)+(2*$d3)+$d4+$d5+$d6+$d7)/13;
Where:
- $d1 = Number of sales in the last 24 hours.
- $d2 = Number of sales in the last 24-48 hours.
- $d3 = Number of sales in the last 48-72 hours.
- $d4 = Number of sales in the last 72-96 hours.
and so on...
Currently, I'm trying to run this on a products dataset of around 500k records, inserting the calculated rank back into the products table so that this can be queried against later. If possible, I'd like to be able to create a script to re-calculate rankings and run this on a cron every 12 or 24 hours.
Current Implementation:
My current implementation takes far too long to execute, and I feel a lot more of the processing needs to be done at the SQL level (with a lot fewer SELECT queries) but I'm unsure how to go about starting this.
$products = mysql_query("SELECT * FROM products ORDER BY id DESC"); // <-- Est 450-500k rows.
while($product = mysql_fetch_array($products)) {
$product_id = $product['id'];
$d1 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 24 HOUR) AND NOW())") or die(mysql_error);
$d1 = mysql_fetch_array($d1);
$d2 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 48 HOUR) AND (NOW()-INTERVAL 24 HOUR))");
$d2 = mysql_fetch_array($d2);
$d3 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 72 HOUR) AND (NOW()-INTERVAL 48 HOUR))");
$d3 = mysql_fetch_array($d3);
$d4 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 96 HOUR) AND (NOW()-INTERVAL 72 HOUR))");
$d4 = mysql_fetch_array($d4);
$d5 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 120 HOUR) AND (NOW()-INTERVAL 96 HOUR))");
$d5 = mysql_fetch_array($d5);
$d6 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 144 HOUR) AND (NOW()-INTERVAL 120 HOUR))");
$d6 = mysql_fetch_array($d6);
$d7 = mysql_query("SELECT COUNT(*) FROM orders WHERE (product_id = '$product_id') AND (sale_completed BETWEEN (NOW()-INTERVAL 168 HOUR) AND (NOW()-INTERVAL 144 HOUR))");
$d7 = mysql_fetch_array($d7);
$d1 = $d1[0];
$d2 = $d2[0];
$d3 = $d3[0];
$d4 = $d4[0];
$d5 = $d5[0];
$d6 = $d6[0];
$d7 = $d7[0];
$rolling_avg = ((4*$d1)+(3*$d2)+(2*$d3)+$d4+$d5+$d6+$d7)/13;
mysql_query("UPDATE products SET rolling_sales = '$rolling_avg' WHERE id = '$product_id'");
}
Not sure how to optimize/ progress from here. But it definitely needs a lot of work.
Before it's mentioned, I understand mysql_*
functions are depreciated, and I will move this over to PDO before it moves into a production environment.