1

I am working on a problem of ranking of items involving two variables: popularity and location.

The goal I have is to come up with a way of deciding the best trade-off between popularity and distance away for items in my set. That is, from a set of items with popularity and a geolocation, as well as my location (thus, the distance), I want to find the most important one.

The following solution was mentioned in a previous question, which did not get much attention:

Given a place p you can calculate the importance of the place I(p) by using the popularity P(p) and the distance D(p). You should decide or find the best values for the weights a and b.

I(p) = a * P(p) - b * D(p)

Now, how do I best determine the weights of the values a and b?

I have a set of "solutions", I can use. Each solution includes a subset of items with their popularity and distance away, as well as which ONE item among the set was deemed most relevant/important.

Tim Petri
  • 95
  • 2
  • 6

2 Answers2

2

You do not need both weights. Since you do not want an absolute importance value (you only want to tell which items are more important than others), you can reduce to one parameter:

I(p) = a * P(p) + D(p),

where P(p) is the importance term based on the item quality (or whatever it is) and D(p) is the importance term based on the distance. Here, you probably want a decreasing function of distance.

As far as I understand, finding the weight is an offline-process that is performed only once. Therefore, a very simple sampling approach would be sufficient.

The easiest way to do this is the following: Sample some domain of a (e.g. assume a reasonable lower and upper bound, then just iterate this interval with a given step width). Evaluate the subsets of the solution and find the item with the highest importance. Count how many of the subsets picked the correct relevant item. Finally, the value of a that produced the highest correct count is considered the best choice.

Nico Schertler
  • 32,049
  • 4
  • 39
  • 70
  • I see how the absolute importance is of no relevance here. The final implementation would rank these items on-the-go anyway, and the distance varies varies every time. Now, since distance should penalize the importance/rank, is your way of simply multiplying D(p) correct? – Tim Petri Aug 08 '16 at 22:49
  • Actually, changing a will not make a difference. The ranking/importance of items within a set would always be the same for any a. – Tim Petri Aug 08 '16 at 22:57
  • Oh, sorry. That's a typo. Of course, it should be a plus. – Nico Schertler Aug 08 '16 at 23:19
  • You could basically adjust the two terms as you need. I.e. if the distance should have a negative impact, I would probably use something like `1 / D(p)` or `exp(-D(p)^2)`. This gives additional importance to close items and does not change the importance (noticeably) for far items. – Nico Schertler Aug 08 '16 at 23:26
0

Do you have any real dataset?? Like, a real ranking with the distance D(p) and popularity P(p) of all the locations??

If you have that, you can first train your formula, that is

I(p) = a * P(p) - b * D(p)

with all the pairs of values for (a,b) in the following set ->

{(1,1),(1,2), ... , (1,10)}
{(2,1),(2,2), ... , (2,10)}
...........................
...........................
{(10,1),(10,2), ... , (10,10)}

For all these 100 pairs, you can create a temporary_ranklist for all the pairs & check for which pair your temporary_ranklist is closest to the real raklist.

That pair of (a,b) is what you are looking for. I think it helps :)

jbsu32
  • 1,036
  • 1
  • 11
  • 31