2

I have the following input for Myrrix:

11, 101, 1
11, 102, 1
11, 103, 1
11, 104, 1000
11, 105, 1000
11, 106, 1000

12, 101, 1
12, 102, 1
12, 103, 1
12, 222, 1

13, 104, 1000
13, 105, 1000
13, 106, 1000
13, 333, 1000

I am looking for items to recommend to user 11. The expectation is that item 333 will be recommended first (because of the higher weights for user 13 and items 104, 105, 106).

Here are the recommendation results from Myrrix:

11, 222, 0.04709
11, 333, 0.0334058

Notice that item 222 is recommended with strength 0.047, but item 333 is only given a strength of 0.033 --- the opposite of the expected results.

I also would have expected the difference in strength to be larger (since 1000 and 1 are so different), but obviously that's moot when the order isn't even what I expected.

How can I interpret these results and how should I think about the weight parameter? We are working with a large client under a tight deadline and would appreciate any pointers.

Daniel Brockman
  • 18,826
  • 3
  • 29
  • 40

1 Answers1

1

It's hard to judge based on a small and synthetic data set. I think the biggest factor will be parameters here -- what are the # of features? lambda? I would expect features = 2 here. If it's higher I think you quickly over-fit this and the results are mostly the noise left over from that after it perfectly explains that user 11 doesn't interact with 222 and 333.

The values are quite low, suggesting both of these are not likely results, and so their order may be more noise than anything. Do you see different results if the model is rebuilt from another random starting point?

Daniel Brockman
  • 18,826
  • 3
  • 29
  • 40
Sean Owen
  • 66,182
  • 23
  • 141
  • 173
  • OK. I was using the default parameters. I’ve run the same example a few times with `-Dmodel.features=2` and I’m getting different values for 333 and 222, fluctuating quite wildly now: (.0800111, .0614549), (.195253, .0825095), (.170418, .0889612), (.117045, .0754171), (.00336292, .00375809). For the most part 333 is getting a higher score (except in the last example), so it is definitely an improvement. Do you think I could get the results to stabilize by tweaking the lambda parameter? Can Myrrix be used with small data sets like this at all (for demo purposes)? – Daniel Brockman Sep 16 '13 at 08:07
  • 1
    Yes you might try increasing lambda to 0.1 or even 1. On a tiny synthetic data set I think it's going to be quite sensitive to initial conditions until overwhelmed by the regularization parameter. You can use it for tiny data sets, but it is certainly intended for millions. – Sean Owen Sep 16 '13 at 08:15
  • 2
    I was able to get fairly consistent results by increasing lambda. Taken to an extreme, you can run many more iterations and increase lambda a lot to make even the scores steady to within 5% or so: `ava -Dmodel.lambda=10 -Dmodel.als.iterations.convergenceThreshold=0.000001 -Dmodel.iterations.max=1000 -Dmodel.features=2 -jar myrrix...` – Sean Owen Sep 16 '13 at 11:52
  • Thank you, I can indeed reproduce those results with the parameters you provided. The difference in score (~.48 vs ~.41) is not as significant as we would have expected, but that’s obviously just a matter of how you interpret the numbers, weights, scores. It’s good to know in any case that the presence/absence of a connection is far more significant than the relative weights between connections. Would it be fair to say for example that a connection with weight 0.01 is much more like a connection with weight 1 than no connection at all? – Daniel Brockman Sep 16 '13 at 13:05
  • 1
    I suppose the intuition here is that the model is torn between explaining why 13:333 exists, but 11:333 does not. You have "strong" evidence that it does, and does not, go with 104/105/106. Same with the other item. The tug of war is not so much between a strong a weak force here, but between the side effects of a strong tug of war and a weak tug of war. You can interpret the weight as a penalty on not scoring an existing interaction a "1". The penalty is the square of the difference between 1 and the prediction, times weight. – Sean Owen Sep 16 '13 at 13:10
  • Thank you Sean, that’s helpful. I can appreciate what you’re saying about the output being a function of a tug-of-war between tug-of-war-effects, and that the “strong” and “weak” forces (“1000” vs. “1”) are not directly pitted against one another. I’m struggling to understand your last two sentences though, but I guess you’re saying that by putting in large weights, what I’m doing is essentially scaling up the (normally small) negative contributions that come from the nonexistent connections? Is using large weights the same as adding the nonexistent connections with large negative weights? – Daniel Brockman Sep 16 '13 at 13:47
  • 1
    It will make more sense if you're familiar with squared error loss function, which turns up all over the place in stats. You try to build a model that would give you back your input exactly. It can't exactly and you trade off reproducing some inputs while making bigger errors on others. You minimize the total squared-error over all inputs. Weights come into the picture as weights for errors. Errors on some of the inputs above are 1000x more 'costly' than others. – Sean Owen Sep 16 '13 at 15:47