1

I will break this into two parts:

  1. I have a user with a list of product 'features' that he desires. The user is asked to rank these features from 1-10. I also have a table full of products. Each of these products has various features. Some products will have all features 1-10, but some will only have a few of them. I'd like to determine which product matches the user's ranking the best, and provide an ordered result set to the user, with the best match first and the worst match last.

  2. The aforementioned query is actually just one part of what the user is doing. The user is also providing other search criteria (e.g. product category, price, etc) along with the feature ranking. So, I'd like the feature matching to only apply to products that pass through this set of filters. How would I combine these?

I have found some similar questions like this and this, but they aren't close enough for me to figure out how to apply those solutions to my situation.

The tables involved look something like this:

products (id,product)

features (id, feature)

product_features (id, product_id, feature_id)

and the user will be submitting his search through an array like this (where the keys are actually feature ids, and the rank is the value):

[users_features] => Array
    (
        [1] => Array
            (
                [rank] => 9
            )

        [2] => Array
            (
                [rank] => 1
            )

        [3] => Array
            (
                [rank] => 3
            )

(this will be a 10 item array right now, but could change in the future)

    )
Community
  • 1
  • 1
HipN
  • 35
  • 2
  • 3
  • One problem I see here is that the products that have all the features will always rank at the top of the list - unless thats OK. – Eric Petroelje Dec 12 '12 at 22:00
  • You need a way to apply the rankings to the features. For instance, if one product only matches rank = 1 but another matches rank = 2 and 3, which is chosen? – Gordon Linoff Dec 12 '12 at 22:32
  • @EricPetroelje it's ok if all the products having all the features rank at the top. – HipN Dec 13 '12 at 00:53

2 Answers2

0

I can't quite get this to work using SQL only but a simple PHP solution would be similar to the following (untested):

//Array of $productId => list of feature ids
$products;
$userFeatures;
$scores = array();
//For every product work out a score based on features and user ranking.
foreach($products as $productId => $prodFeatures){
    $score = 0;
    foreach($prodFeatures as $feature){
        //Could also perhaps penalise products lacking features.
        $score += $userFeatures[$feature]["rank"];
    }
    $scores[$productId] = $score;
}
arsort($scores);
echo "Best match is ".$scores[0];

Obviously this is a bit rough and ready but hopefully it helps.

Edit: This assumes that a ranking of 10 is the best.

Jim
  • 22,354
  • 6
  • 52
  • 80
  • your scoring concept and using 10 as the best rank makes a lot of sense. i've run a few spreadsheet scenarios, and it seems to perform fairly well. would you recommend i run the 'filter' queries first, then build an array of products from that result set and jump into this scoring function? – HipN Dec 13 '12 at 01:38
  • I wound up going with your approach. I ran my filter queries, then used an array of product ids from that result to begin this scoring routine. Thanks! – HipN Dec 14 '12 at 01:05
0

A solution is to divide 1 by the user rank. For example 1/R. Then sum all the possibilities and sort the list in descendant order. 1/R is also the chance to not pick this product from the list. When you need to normalize the value or make the sum of the values equal to 1 you need to multiply the value with the sum of all the reziprokes of the values. For example r1=30 and r2=15. 1/(1/30 + 1/15) = 10, so P1 = 10 * 1/30 = 1/3 and P2 = 10 * 1/15 = 2/3. When you normalize the value it also means that a product with less attributes is also less likely.

Micromega
  • 12,486
  • 7
  • 35
  • 72
  • This seems interesting, but for the life of me, I can't quite understand how to apply it. Are r1 and r2 in your example the ranks of features 1 and 2? Are P1 and P2 products 1 and 2? – HipN Dec 13 '12 at 00:58
  • 1) yes 2) no. P1 and p2 is the also the ranks of the features but normalized so that the sum of all is 1. – Micromega Dec 13 '12 at 01:17
  • ah! i get it now...makes sense as a way to score each product based on its features, and still use a rank of 1 as the highest. do you have any recommendation on how to structure the query? – HipN Dec 13 '12 at 01:56