0

I have a fairly large dataset called offers containing around 7m rows.

The table has 30 columns but I'm only using two of them, cap_id - a unique identifier for a vehicle, and price - the monthly cost to lease the vehicle.

I want to write a query returning the best (lowest) and second best price per cap_id, as well as a percentage saving of the best price compared to the next best.

I'm using version 5.7.12

Here's the SQLFiddle

Create table query:

CREATE TABLE `offers` (
  `id` mediumint(8) unsigned NOT NULL auto_increment,
  `cap_id` varchar(255) default NULL,
  `price` mediumint default NULL,
  PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;

INSERT INTO `offers` (`cap_id`,`price`) VALUES 
(18452,1007),(18452,884),(18452,276),(90019,328),(73353,539),(64854,249),(26684,257),(37452,966),(90019,980),(73353,1241),
(73353,1056),(37452,1043),(26684,829),(37452,260),(64854,358),(26684,288),(26684,678),(26684,905),(37452,1140),(94826,901),
(90019,745),(37452,1156),(37452,191),(64854,324),(73353,1110),(87725,624),(87725,973),(90019,1203),(90019,709),(18452,1133),
(18452,1019),(37452,639),(37452,1021),(87725,485),(94826,964),(37452,1066),(94826,823),(73353,1056),(18452,621),(37452,272),
(90019,223),(26684,412),(87725,310),(37452,948),(37452,826),(18452,1078),(90019,737),(18452,1166),(73353,150),(73353,1115),
(94826,957),(87725,242),(94826,715),(73353,1190),(94826,320),(94826,869),(64854,574),(94826,505),(26684,322),(90019,949),
(64854,1188),(37452,368),(90019,796),(87725,514),(37452,146),(94826,1216),(18452,625),(64854,1165),(18452,712),(37452,947),
(64854,616),(73353,1065),(26684,1167),(18452,935),(87725,1192),(26684,519),(64854,939),(90019,367),(26684,145),(64854,1076),
(26684,1016),(90019,606),(37452,1066),(73353,609),(94826,343),(94826,236),(94826,1059),(26684,681),(37452,779),(94826,259),
(87725,1080),(37452,914),(90019,826),(37452,597),(26684,879),(87725,471),(94826,680),(18452,906),(87725,860),(94826,1009);

This is what I've tried so far:

SELECT 
 o1.cap_id,
 o2.price AS best_price,
 o1.price AS next_best,
 (o1.price / o2.price) * 100 AS '%_diff'
FROM
 offers o1
     JOIN
 offers o2 ON o1.cap_id = o2.cap_id
     AND o1.price > o2.price
GROUP BY o1.cap_id
HAVING COUNT(o1.price) = 2

This returns 0 rows, and runs super slowly when I run it in our DB.

This is the output of EXPLAIN:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra FIELD13 FIELD14
1 SIMPLE x index cap_id idx_profile_grouping idx_capId_monthlyPayment idx_capId_monthlyPayment 9 7220930 100.00 Using index; Using temporary; Using filesort
1 SIMPLE y ref cap_id idx_profile_grouping idx_capId_monthlyPayment idx_capId_monthlyPayment 4 moneyshake.x.cap_id 871 33.33 Using where; Using index

Thanks in advance for any suggestions.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • 1
    You link to version 5.6. Is that the version you're using? – Strawberry Apr 30 '21 at 11:33
  • That's easy with MySQL 8. In older versions this gets a bit awkward. So, please answer Strawberry's question. Which version are you using? – Thorsten Kettner Apr 30 '21 at 11:37
  • @Strawberry just edited - using 5.7.12, thanks – Alex Midmore Apr 30 '21 at 11:38
  • If the best price appears twice for a cap_id, is the best price then equal to the second best? Or are you looking for the next lower price then? – Thorsten Kettner Apr 30 '21 at 11:40
  • 5.7 is a quite old version. Can't you just upgrade to MySQL 8? It offers many features that MySQL was always lacking (window functions, recursive and non-recursive CTEs). – Thorsten Kettner Apr 30 '21 at 11:49
  • @ThorstenKettner if the next best price is identical to the best, the % difference should be 0 and it wouldn't be counted as a 'discounted' offer. I'll have to pester our devs to upgrade us to MySQL 8 :) – Alex Midmore Apr 30 '21 at 12:03

2 Answers2

1

This process can be optimised in newer versions of MySQL, but as your fiddle is in 5.6, so is my answer:

SELECT x.*
     , COUNT(*) running 
  FROM offers x 
  JOIN offers y 
    ON y.cap_id = x.cap_id 
   AND y.price < x.price 
 GROUP 
    BY x.id
 ORDER
    BY x.cap_id 
     , x.price;

Extending this idea:

SELECT a.*
     , b.price
     , 1-(a.price/b.price) saving
  FROM 
     ( SELECT cap_id
            , MIN(price) price
         FROM offers 
        GROUP  
           BY cap_id
     ) a -- lowest price per cap_id
  JOIN 
     ( SELECT x.cap_id
            , x.price
         FROM offers x
         JOIN offers y
           ON y.cap_id = x.cap_id 
          AND y.price < x.price 
        GROUP 
           BY x.id
       HAVING COUNT(*)= 2
    ) b -- 2nd lowest price per cap_id (other methods are available)
   ON b.cap_id = a.cap_id
ORDER 
   BY a.cap_id;
Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • To which 'final piece of the puzzle' you are referring I am unsure - I tried running your code and this: `SELECT x.cap_id, y.price as best_price, x.price as next_best FROM offers x JOIN offers y ON y.cap_id = x.cap_id AND y.price < x.price GROUP BY x.cap_id ORDER BY x.cap_id , x.price;` neither of which worked - they just run for ever. – Alex Midmore Apr 30 '21 at 11:52
  • My code seems to work just fine: http://sqlfiddle.com/#!9/ddd9eb/13 – Strawberry Apr 30 '21 at 11:56
  • I suppose performance is the problem here. I'm also still unsure of how to return the percentage 'saving' of the best price compared to next best – Alex Midmore Apr 30 '21 at 12:02
1

In MySQL 8.x you can do:

with
p as (
  select
    id, cap_id, price,
    row_number() over(partition by cap_id order by price) as rn
  from offers
)
select 
  a.id as lowest_id, a.cap_id as lowest_cap_id, a.price as lowest_price,
  b.id as second_id, b.cap_id as second_cap_id, b.price as second_price,
  case when b.price is not null then 
    (b.price - a.price) / b.price
  end as percentage_saving
from p a
left join p b on a.cap_id = b.cap_id and b.rn = 2
where a.rn = 1
The Impaler
  • 45,731
  • 9
  • 39
  • 76
  • In MySQL workbench it says 'target mysql version' is 8.XX, but our RDS instance is still running 5.7. Time to look into upgrading I guess. – Alex Midmore Apr 30 '21 at 11:56