0

In itemsimilarity metod tere is a parameter like:

--maxPrefs (-mppu) maxPrefs - max number of preferences to consider per user or item, users or items with more preferences will be sampled down (default: 500)

How does it work exactly? If I have 5 mln users and 5000 items and I run itemsimilarity with default maxPrefs, it consider only 500 ranks from those 5 mln or what? Is it sampling? What can I do to force calculation for all input data?

What does mean "or" in definition: "max number of preferences to consider per user or item"

herder
  • 412
  • 2
  • 5
  • 16

1 Answers1

1

This was answered on the mailing list here: http://article.gmane.org/gmane.comp.apache.mahout.user/20827/match=

Basically several forms of downsampling happen to keep a high degree of quality while keeping the runtime to O(n) execution time.

Change --maxPrefs (-mppu) maxPrefs to 4000 or the highest integer value to include all.

pferrel
  • 5,673
  • 5
  • 30
  • 41