0

I am using :

hadoop-1.2.1 and mahout-distribution-0.8

When I try to run HASHMIN method with following command:

$MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.minhash.MinHashDriver -i tce-data/cv.vec -o tce-data/out/cv/minHashDriver/ -ow

I get this error:

tce@osy-Inspiron-N5110:~$ $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.minhash.MinHashDriver  -i  tce-data/cv.vec  -o tce-data/out/cv/minHashDriver/ -ow
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /home/tce/app/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/tce/app/mahout-distribution-0.8/mahout-examples-0.8-job.jar
Warning: $HADOOP_HOME is deprecated.

13/09/10 18:17:46 WARN driver.MahoutDriver: No org.apache.mahout.clustering.minhash.MinHashDriver.props found on classpath, will use command-line arguments only
13/09/10 18:17:46 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --hashType=[MURMUR], --input=[tce-data/cv.vec], --keyGroups=[2], --minClusterSize=[10], --minVectorSize=[5], --numHashFunctions=[10], --numReducers=[2], --output=[tce-data/out/cv/minHashDriver/], --overwrite=null, --startPhase=[0], --tempDir=[temp], --vectorDimensionToHash=[value]}
13/09/10 18:17:48 INFO input.FileInputFormat: Total input paths to process : 1
13/09/10 18:17:50 INFO mapred.JobClient: Running job: job_201309101645_0031
13/09/10 18:17:51 INFO mapred.JobClient:  map 0% reduce 0%
13/09/10 18:18:27 INFO mapred.JobClient: Task Id : attempt_201309101645_0031_m_000000_0, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
    at org.apache.mahout.clustering.minhash.MinHashMapper.map(MinHashMapper.java:30)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

I appreciate any idea

Osy
  • 1,613
  • 5
  • 21
  • 35
  • http://stackoverflow.com/questions/14922087/hadoop-longwritable-cannot-be-cast-to-org-apache-hadoop-io-intwritable http://stackoverflow.com/questions/11784729/hadoop-java-lang-classcastexception-org-apache-hadoop-io-longwritable-cannot – Sandip Oct 22 '13 at 10:45
  • Thank you @Sandip but by now I do not have custom programming, I am only following examples, maybe can be a problem with versions. – Osy Oct 22 '13 at 17:41

1 Answers1

0

Then cross check few things, Your job.setOutputKeyClass, job.setOutputValueClass, job.setMapOutputKeyClass and job.setMapOutputValueClass should match with reducer key, reducer value, mapper key and mapper value class respectively.

Your stacktrace says there is mismatch in Mapper. Your MinHashMapper should extend Mapper<[A, B, C, D >] where C and D be same as job.setMapOutputKeyClass(C) and job.setMapOutputValueClass(D)

JoeBilly
  • 3,057
  • 29
  • 35
Sandip
  • 317
  • 1
  • 3
  • 8