I have a fairly large dataset called offers containing around 7m rows.
The table has 30 columns but I'm only using two of them, cap_id - a unique identifier for a vehicle, and price - the monthly cost to lease the vehicle.
I want to write a query returning the best (lowest) and second best price per cap_id, as well as a percentage saving of the best price compared to the next best.
I'm using version 5.7.12
Here's the SQLFiddle
Create table query:
CREATE TABLE `offers` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`cap_id` varchar(255) default NULL,
`price` mediumint default NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
INSERT INTO `offers` (`cap_id`,`price`) VALUES
(18452,1007),(18452,884),(18452,276),(90019,328),(73353,539),(64854,249),(26684,257),(37452,966),(90019,980),(73353,1241),
(73353,1056),(37452,1043),(26684,829),(37452,260),(64854,358),(26684,288),(26684,678),(26684,905),(37452,1140),(94826,901),
(90019,745),(37452,1156),(37452,191),(64854,324),(73353,1110),(87725,624),(87725,973),(90019,1203),(90019,709),(18452,1133),
(18452,1019),(37452,639),(37452,1021),(87725,485),(94826,964),(37452,1066),(94826,823),(73353,1056),(18452,621),(37452,272),
(90019,223),(26684,412),(87725,310),(37452,948),(37452,826),(18452,1078),(90019,737),(18452,1166),(73353,150),(73353,1115),
(94826,957),(87725,242),(94826,715),(73353,1190),(94826,320),(94826,869),(64854,574),(94826,505),(26684,322),(90019,949),
(64854,1188),(37452,368),(90019,796),(87725,514),(37452,146),(94826,1216),(18452,625),(64854,1165),(18452,712),(37452,947),
(64854,616),(73353,1065),(26684,1167),(18452,935),(87725,1192),(26684,519),(64854,939),(90019,367),(26684,145),(64854,1076),
(26684,1016),(90019,606),(37452,1066),(73353,609),(94826,343),(94826,236),(94826,1059),(26684,681),(37452,779),(94826,259),
(87725,1080),(37452,914),(90019,826),(37452,597),(26684,879),(87725,471),(94826,680),(18452,906),(87725,860),(94826,1009);
This is what I've tried so far:
SELECT
o1.cap_id,
o2.price AS best_price,
o1.price AS next_best,
(o1.price / o2.price) * 100 AS '%_diff'
FROM
offers o1
JOIN
offers o2 ON o1.cap_id = o2.cap_id
AND o1.price > o2.price
GROUP BY o1.cap_id
HAVING COUNT(o1.price) = 2
This returns 0 rows, and runs super slowly when I run it in our DB.
This is the output of EXPLAIN:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | FIELD13 | FIELD14 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | x | index | cap_id | idx_profile_grouping | idx_capId_monthlyPayment | idx_capId_monthlyPayment | 9 | 7220930 | 100.00 | Using index; Using temporary; Using filesort | ||
1 | SIMPLE | y | ref | cap_id | idx_profile_grouping | idx_capId_monthlyPayment | idx_capId_monthlyPayment | 4 | moneyshake.x.cap_id | 871 | 33.33 | Using where; Using index |
Thanks in advance for any suggestions.