I have a table in postgresql containing some cars +1000000 records:
+----+--------+------+---------+-----------+-------------+------------+------------+
| id | price | year | mileage | fuel_type | body_type | brand | model |
+----+--------+------+---------+-----------+-------------+------------+------------+
| 1 | 4894 | 2011 | 121842 | "Benzin" | "Sedan" | "Toyota" | "Yaris" |
| 2 | 4989 | 2012 | 33901 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 3 | 4990 | 2013 | 55105 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 3 | 5290 | 2013 | 20967 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 5 | 5594 | 2008 | 121281 | "Benzin" | "Hatchback" | "Mercedes" | "A170" |
| 6 | 4690 | 2012 | 71303 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 7 | 5290 | 2013 | 58300 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 8 | 5890 | 2013 | 35732 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 9 | 5990 | 2013 | 38777 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 10 | 6180 | 2013 | 69491 | "Benzin" | "Hatchback" | "VW" | "up!" |
| 11 | 6490 | 2012 | 72900 | "Benzin" | "Sedan" | "Renault" | "Clio III" |
| 12 | 6790 | 2012 | 49541 | "Benzin" | "Hatchback" | "Renault" | "Clio III" |
| 13 | 6790 | 2012 | 46377 | "Benzin" | "Hatchback" | "Renault" | "Clio III" |
| 14 | 6790 | 2012 | 45200 | "Benzin" | "Hatchback" | "Renault" | "Clio III" |
| 15 | 6894 | 2007 | 108840 | "Benzin" | "Sedan" | "VW" | "Golf V" |
| 16 | 6990 | 2009 | 54200 | "Benzin" | "Sedan" | "Renault" | "Mégane" |
| 17 | 6990 | 2012 | 40652 | "Benzin" | "Hatchback" | "Renault" | "Clio III" |
| 18 | 6990 | 2012 | 38080 | "Benzin" | "Sedan" | "Renault" | "Clio III" |
| 19 | 7290 | 2012 | 28600 | "Benzin" | "Hatchback" | "Renault" | "Clio III" |
| 20 | 7290 | 2013 | 52800 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
+----+--------+------+---------+-----------+-------------+------------+------------+
I would like to create a recommendation engine, that can return the 20 most "similar" matches based on some varying criteria e.g. When a user does a search for: brand = 'Renault' AND price < 60000 AND year > 2010
, I want to present, outside the search result some other, more loose results with other cars, that is similar, but doesn't necessarily match all the search criteria exact.
I have tried making some rule based code in ruby, that does something like:
- If you search by a 'Renault Clio' we then ´Renault Twingo´ is a close match too
- If you have max price of 8000 then order by those closest to that
- etc. etc.
Based on this code, I generate an SQL query with where and order by clauses.
The problem however is, that things get's huge, as I have like 20 different columns I would like to optionally take into consideration, base on the initial criteria. Also I want the recommendation to be backward compatible in the sense that I don't want to just do a simple filtering (WHERE
) query, that in cases might end up returning zero matches. Instead I want it do something similar to when you use text similarity algorithms, where you can compare one phrase to all and get a comparison score for all of them which you can then sort by.
I'm super puzzled about how I could implement this, in an approach this is NOT defining 1000 rules and if/then statements to generate an SQL query. Is there some other technique I could use, or maybe another technology than postgresql?