0

I am getting following error when running mahout spark-itemsimilarity from terminal with input path to directory.

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
    at org.apache.mahout.math.cf.SimilarityAnalysis$.cooccurrencesIDSs(SimilarityAnalysis.scala:119)
    at org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:214)
    at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116)
    at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114)
    at scala.Option.map(Option.scala:145)
    at org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114)
    at org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala)

Thanks in advance.

KlwntSingh
  • 1,084
  • 8
  • 26

2 Answers2

3

Use Mahout 0.10.1-SNAPSHOT on the 0.10.x branch in Github since it does not need the -D:spark... option.

Using a directory as input requires a pattern to match files. The default pattern matches HDFS "part-xxxxx" files. Use the following command:

$ mahout spark-itemsimilarity -i /home/kulwant/data/ -fp ".*csv" -o /home/kulwant/output/ --master spark://kulwant-VirtualBox:7077 -id "," --itemIDColumn 0 --rowIDColumn 1

RowID = user id so given your data I think you have the item and row columns reversed. The item id seems to be in column 0 and the row/user is in column 1 (I've fixed above).

pferrel
  • 5,673
  • 5
  • 30
  • 41
0

@eliasah

./mahout spark-itemsimilarity -D:spark.executor.extraClassPath=/home/kulwant/mahout/spark/target/mahout-spark_2.10-0.11.0-SNAPSHOT-dependency-reduced.jar --input /home/kulwant/data/ 

--output /home/kulwant/output --master spark://kulwant-VirtualBox:7077 --inDelim , --itemIDColumn 1 --rowIDColumn 0

This is the command which i execute from terminal

KlwntSingh
  • 1,084
  • 8
  • 26