I'm trying to create get itemismilarity using mahout. The problem is that I do get few similarities in output.
Here are my input data characteristics:
- 15.910.847 total count of preferences
- 4.047.745 distinct users
- 773.015 distinct items I've built the distribution of users and prefereces
The first column is count of distinct users
The second column is count of preferences per users. I do have 2.221.760 uses which have only one preference, for example.
2221760 1 688258 2 322497 3 192003 4 122446 5 87033 6 63733 7 49556 8 39090 9 31637 10 25634 11
Here are my input settings:
similarityClassname=SIMILARITY_PEARSON_CORRELATION
maxSimilaritiesPerItem=100000
minPrefsPerUser=0
booleanData=false
threshold=0.75