5

I'm using cassandra-all 2.0.7 api with hadoop 2.2.0.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>zazzercode</groupId>
    <artifactId>doctorhere-engine-writer</artifactId>
    <version>1.0</version>
    <packaging>jar</packaging>

    <name>DoctorhereEngineWriter</name>

    <properties>
      <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
      <cassandra.version>2.0.7</cassandra.version>
      <hector.version>1.0-2</hector.version>
      <guava.version>15.0</guava.version>
      <hadoop.version>2.2.0</hadoop.version>
    </properties>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.3.2</version>
                <configuration>
                    <source>1.6</source>
                    <target>1.6</target>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>zazzercode.DiseaseCountJob</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
            </plugin>
        </plugins>
    </build>

    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>me.prettyprint</groupId>
            <artifactId>hector-core</artifactId>
            <version>${hector.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>org.apache.thrift</artifactId>
                    <groupId>libthrift</groupId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.cassandra</groupId>
            <artifactId>cassandra-all</artifactId>
        <version>${cassandra.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>libthrift</artifactId>
                    <groupId>org.apache.thrift</groupId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.cassandra</groupId>
            <artifactId>cassandra-thrift</artifactId>
        <version>${cassandra.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>libthrift</artifactId>
                    <groupId>org.apache.thrift</groupId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.thrift</groupId>
            <artifactId>libthrift</artifactId>
            <version>0.7.0</version>
        </dependency>
      <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
        <version>${guava.version}</version>
      </dependency>

      <dependency>
        <groupId>com.googlecode.concurrentlinkedhashmap</groupId>
        <artifactId>concurrentlinkedhashmap-lru</artifactId>
        <version>1.3</version>
      </dependency>


    </dependencies>
</project>

When I start the jar (created after mvn assembly:assembly from normal user prayagupd) as below from hduser,

hduser@prayagupd$ hadoop jar target/doctorhere-engine-writer-1.0-jar-with-dependencies.jar /user/hduser/shakespeare

I get following guava collection error on cassandra api,

14/11/23 17:51:04 WARN mapred.LocalJobRunner: job_local800673408_0001
java.lang.NoSuchMethodError: com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;
    at org.apache.cassandra.config.Config.<init>(Config.java:53)
    at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:105)
    at org.apache.cassandra.hadoop.BulkRecordWriter.<init>(BulkRecordWriter.java:105)
    at org.apache.cassandra.hadoop.BulkRecordWriter.<init>(BulkRecordWriter.java:90)
    at org.apache.cassandra.hadoop.BulkOutputFormat.getRecordWriter(BulkOutputFormat.java:69)
    at org.apache.cassandra.hadoop.BulkOutputFormat.getRecordWriter(BulkOutputFormat.java:29)
    at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:558)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:632)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:405)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:445)
14/11/23 17:51:04 INFO mapreduce.Job:  map 100% reduce 0%

Line#53 of cassandra api's Config.java has this code,

public Set<String> hinted_handoff_enabled_by_dc = Sets.newConcurrentHashSet();

Whereas, I find Sets class with the jar itself,

  hduser@prayagupd$ jar tvf target/doctorhere-engine-writer-1.0-jar-with-dependencies.jar | grep com/google/common/collect/Sets
  2358 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$1.class
  2019 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$2.class
  1705 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$3.class
  1327 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$CartesianSet$1.class
  4224 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$CartesianSet.class
  5677 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$DescendingSet.class
  4187 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$FilteredNavigableSet.class
  1567 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$FilteredSet.class
  2614 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$FilteredSortedSet.class
  1174 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$ImprovedAbstractSet.class
  1361 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$PowerSet$1.class
  3727 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$PowerSet.class
  1398 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$SetView.class
  1950 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$SubSet$1.class
  2058 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$SubSet.class
  4159 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$UnmodifiableNavigableSet.class
 17349 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets.class

Also, there exists the method when I checked the jar as below,

hduser@prayagupd$ javap -classpath target/doctorhere-engine-writer-1.0-jar-with-dependencies.jar com.google.common.collect.Sets | grep newConcurrentHashSet
  public static <E extends java/lang/Object> java.util.Set<E> newConcurrentHashSet();
  public static <E extends java/lang/Object> java.util.Set<E> newConcurrentHashSet(java.lang.Iterable<? extends E>);

I see com.google.guava under /META/INF/maven library when I navigate the jar file, jar

I have following artifacts in ~/.m2 from outside of hdfs user,

$ ll ~/.m2/repository/com/google/guava/guava
total 20
drwxrwxr-x 5 prayagupd prayagupd 4096 Nov 23 20:05 ./
drwxrwxr-x 4 prayagupd prayagupd 4096 Nov 23 20:05 ../
drwxrwxr-x 2 prayagupd prayagupd 4096 Nov 23 20:05 11.0.2/
drwxrwxr-x 2 prayagupd prayagupd 4096 Nov 23 20:06 15.0/
drwxrwxr-x 2 prayagupd prayagupd 4096 Nov 23 20:05 r09/

And hadoop classpath is

$ hadoop classpath
/usr/local/hadoop-2.2.0/etc/hadoop:
/usr/local/hadoop2.2.0/share/hadoop/common/lib/*:
/usr/local/hadoop-2.2.0/share/hadoop/common/*:
/usr/local/hadoop-2.2.0/share/hadoop/hdfs:
/usr/local/hadoop-2.2.0/share/hadoop/hdfs/lib/*:
/usr/local/hadoop-2.2.0/share/hadoop/hdfs/*:
/usr/local/hadoop-2.2.0/share/hadoop/yarn/lib/*:
/usr/local/hadoop-2.2.0/share/hadoop/yarn/*:
/usr/local/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:
/usr/local/hadoop-2.2.0/share/hadoop/mapreduce/*:
/usr/local/hadoop-2.2.0/contrib/capacity-scheduler/*.jar

dependency tree is as below where com.google.guava:guava:jar:r09:compile is used by me.prettyprint:hector-core:jar:1.0-2:compile, whereas guava-11.0.2.jar is used by hadoop-2.2.0 or hadoop-2.6.0 and cassandra-2.0.6 uses guava-15.0..jar

$ find /usr/local/apache-cassandra-2.0.6/ -name "guava*"
/usr/local/apache-cassandra-2.0.6/lib/guava-15.0.jar
/usr/local/apache-cassandra-2.0.6/lib/licenses/guava-15.0.txt


$ mvn dependency:tree
[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building DoctorhereEngineWriter 1.0
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ doctorhere-engine-writer ---
[INFO] zazzercode:doctorhere-engine-writer:jar:1.0
[INFO] +- junit:junit:jar:3.8.1:test (scope not updated to compile)
[INFO] +- me.prettyprint:hector-core:jar:1.0-2:compile
[INFO] |  +- commons-lang:commons-lang:jar:2.4:compile
[INFO] |  +- commons-pool:commons-pool:jar:1.5.3:compile
[INFO] |  +- com.google.guava:guava:jar:r09:compile
[INFO] |  +- org.slf4j:slf4j-api:jar:1.6.1:compile
[INFO] |  +- com.github.stephenc.eaio-uuid:uuid:jar:3.2.0:compile
[INFO] |  \- com.ecyrd.speed4j:speed4j:jar:0.9:compile
[INFO] +- org.apache.cassandra:cassandra-all:jar:2.0.7:compile
[INFO] |  +- org.xerial.snappy:snappy-java:jar:1.0.5:compile
[INFO] |  +- net.jpountz.lz4:lz4:jar:1.2.0:compile
[INFO] |  +- com.ning:compress-lzf:jar:0.8.4:compile
[INFO] |  +- commons-cli:commons-cli:jar:1.1:compile
[INFO] |  +- commons-codec:commons-codec:jar:1.2:compile
[INFO] |  +- org.apache.commons:commons-lang3:jar:3.1:compile
[INFO] |  +- com.googlecode.concurrentlinkedhashmap:concurrentlinkedhashmap-lru:jar:1.3:compile
[INFO] |  +- org.antlr:antlr:jar:3.2:compile
[INFO] |  |  \- org.antlr:antlr-runtime:jar:3.2:compile
[INFO] |  |     \- org.antlr:stringtemplate:jar:3.2:compile
[INFO] |  |        \- antlr:antlr:jar:2.7.7:compile
[INFO] |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.2:compile
[INFO] |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.2:compile
[INFO] |  +- jline:jline:jar:1.0:compile
[INFO] |  +- com.googlecode.json-simple:json-simple:jar:1.1:compile
[INFO] |  +- com.github.stephenc.high-scale-lib:high-scale-lib:jar:1.1.2:compile
[INFO] |  +- org.yaml:snakeyaml:jar:1.11:compile
[INFO] |  +- edu.stanford.ppl:snaptree:jar:0.1:compile
[INFO] |  +- org.mindrot:jbcrypt:jar:0.3m:compile
[INFO] |  +- com.yammer.metrics:metrics-core:jar:2.2.0:compile
[INFO] |  +- com.addthis.metrics:reporter-config:jar:2.1.0:compile
[INFO] |  |  \- org.hibernate:hibernate-validator:jar:4.3.0.Final:compile
[INFO] |  |     +- javax.validation:validation-api:jar:1.0.0.GA:compile
[INFO] |  |     \- org.jboss.logging:jboss-logging:jar:3.1.0.CR2:compile
[INFO] |  +- com.thinkaurelius.thrift:thrift-server:jar:0.3.3:compile
[INFO] |  |  \- com.lmax:disruptor:jar:3.0.1:compile
[INFO] |  +- net.sf.supercsv:super-csv:jar:2.1.0:compile
[INFO] |  +- log4j:log4j:jar:1.2.16:compile
[INFO] |  +- com.github.stephenc:jamm:jar:0.2.5:compile
[INFO] |  \- io.netty:netty:jar:3.6.6.Final:compile
[INFO] +- org.apache.cassandra:cassandra-thrift:jar:2.0.7:compile
[INFO] +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile
[INFO] |  |  +- org.apache.commons:commons-math:jar:2.1:compile
[INFO] |  |  +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |  |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] |  |  +- commons-io:commons-io:jar:2.1:compile
[INFO] |  |  +- commons-net:commons-net:jar:3.1:compile
[INFO] |  |  +- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] |  |  +- commons-configuration:commons-configuration:jar:1.6:compile
[INFO] |  |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |  |  |  +- commons-digester:commons-digester:jar:1.8:compile
[INFO] |  |  |  |  \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] |  |  |  \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] |  |  +- org.apache.avro:avro:jar:1.7.4:compile
[INFO] |  |  |  \- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
[INFO] |  |  +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] |  |  +- org.apache.hadoop:hadoop-auth:jar:2.2.0:compile
[INFO] |  |  +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile
[INFO] |  |  \- org.apache.commons:commons-compress:jar:1.4.1:compile
[INFO] |  |     \- org.tukaani:xz:jar:1.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-hdfs:jar:2.2.0:compile
[INFO] |  |  \- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.2.0:compile
[INFO] |  |  +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.2.0:compile
[INFO] |  |  |  +- org.apache.hadoop:hadoop-yarn-client:jar:2.2.0:compile
[INFO] |  |  |  \- org.apache.hadoop:hadoop-yarn-server-common:jar:2.2.0:compile
[INFO] |  |  \- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.2.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile
[INFO] |  |  \- org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.2.0:compile
[INFO] |  \- org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile
[INFO] \- org.apache.thrift:libthrift:jar:0.7.0:compile
[INFO]    +- javax.servlet:servlet-api:jar:2.5:compile
[INFO]    \- org.apache.httpcomponents:httpclient:jar:4.0.1:compile
[INFO]       \- org.apache.httpcomponents:httpcore:jar:4.0.1:compile
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 27.124s
[INFO] Finished at: Wed Mar 18 01:39:42 CDT 2015
[INFO] Final Memory: 15M/982M
[INFO] ------------------------------------------------------------------------

Here's hadoop script for hadoop 2.2.0,

$ cat /usr/local/hadoop-2.2.0/bin/hadoop
#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# This script runs the hadoop core commands. 

bin=`which $0`
bin=`dirname ${bin}`
bin=`cd "$bin"; pwd`

DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh

export HADOOP_USER_CLASSPATH_FIRST=true

function print_usage(){
  echo "Usage: hadoop [--config confdir] COMMAND"
  echo "       where COMMAND is one of:"
  echo "  fs                   run a generic filesystem user client"
  echo "  version              print the version"
  echo "  jar <jar>            run a jar file"
  echo "  checknative [-a|-h]  check native hadoop and compression libraries availability"
  echo "  distcp <srcurl> <desturl> copy file or directories recursively"
  echo "  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive"
  echo "  classpath            prints the class path needed to get the"
  echo "                       Hadoop jar and the required libraries"
  echo "  daemonlog            get/set the log level for each daemon"
  echo " or"
  echo "  CLASSNAME            run the class named CLASSNAME"
  echo ""
  echo "Most commands print help when invoked w/o parameters."
}

if [ $# = 0 ]; then
  print_usage
  exit
fi

COMMAND=$1
case $COMMAND in
  # usage flags
  --help|-help|-h)
    print_usage
    exit
    ;;

  #hdfs commands
  namenode|secondarynamenode|datanode|dfs|dfsadmin|fsck|balancer|fetchdt|oiv|dfsgroups|portmap|nfs3)
    echo "DEPRECATED: Use of this script to execute hdfs command is deprecated." 1>&2
    echo "Instead use the hdfs command for it." 1>&2
    echo "" 1>&2
    #try to locate hdfs and if present, delegate to it.  
    shift
    if [ -f "${HADOOP_HDFS_HOME}"/bin/hdfs ]; then
      exec "${HADOOP_HDFS_HOME}"/bin/hdfs ${COMMAND/dfsgroups/groups}  "$@"
    elif [ -f "${HADOOP_PREFIX}"/bin/hdfs ]; then
      exec "${HADOOP_PREFIX}"/bin/hdfs ${COMMAND/dfsgroups/groups} "$@"
    else
      echo "HADOOP_HDFS_HOME not found!"
      exit 1
    fi
    ;;

  #mapred commands for backwards compatibility
  pipes|job|queue|mrgroups|mradmin|jobtracker|tasktracker)
    echo "DEPRECATED: Use of this script to execute mapred command is deprecated." 1>&2
    echo "Instead use the mapred command for it." 1>&2
    echo "" 1>&2
    #try to locate mapred and if present, delegate to it.
    shift
    if [ -f "${HADOOP_MAPRED_HOME}"/bin/mapred ]; then
      exec "${HADOOP_MAPRED_HOME}"/bin/mapred ${COMMAND/mrgroups/groups} "$@"
    elif [ -f "${HADOOP_PREFIX}"/bin/mapred ]; then
      exec "${HADOOP_PREFIX}"/bin/mapred ${COMMAND/mrgroups/groups} "$@"
    else
      echo "HADOOP_MAPRED_HOME not found!"
      exit 1
    fi
    ;;

  classpath)
    echo $CLASSPATH
    exit
    ;;

  #core commands  
  *)
    # the core commands
    if [ "$COMMAND" = "fs" ] ; then
      CLASS=org.apache.hadoop.fs.FsShell
    elif [ "$COMMAND" = "version" ] ; then
      CLASS=org.apache.hadoop.util.VersionInfo
    elif [ "$COMMAND" = "jar" ] ; then
      CLASS=org.apache.hadoop.util.RunJar
    elif [ "$COMMAND" = "checknative" ] ; then
      CLASS=org.apache.hadoop.util.NativeLibraryChecker
    elif [ "$COMMAND" = "distcp" ] ; then
      CLASS=org.apache.hadoop.tools.DistCp
      CLASSPATH=${CLASSPATH}:${TOOL_PATH}
    elif [ "$COMMAND" = "daemonlog" ] ; then
      CLASS=org.apache.hadoop.log.LogLevel
    elif [ "$COMMAND" = "archive" ] ; then
      CLASS=org.apache.hadoop.tools.HadoopArchives
      CLASSPATH=${CLASSPATH}:${TOOL_PATH}
    elif [[ "$COMMAND" = -*  ]] ; then
        # class and package names cannot begin with a -
        echo "Error: No command named \`$COMMAND' was found. Perhaps you meant \`hadoop ${COMMAND#-}'"
        exit 1
    else
      CLASS=$COMMAND
    fi
    shift

    # Always respect HADOOP_OPTS and HADOOP_CLIENT_OPTS
    HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"

    #make sure security appender is turned off
    HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,NullAppender}"

    export CLASSPATH=$CLASSPATH
    exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
    ;;

esac

How could this google-collection issue be fixed?

Actual code here

git clone --branch doctor-engine-writer https://github.com/prayagupd/doctorhere
cd doctorhere/doctorhere-engine-writer

References

Hadoop library conflict at mapreduce time

Community
  • 1
  • 1
prayagupa
  • 30,204
  • 14
  • 155
  • 192
  • is it possible you have an older version of guava on your classpath somehow? – MeBigFatGuy Nov 23 '14 at 14:15
  • @MeBigFatGuy After build (`mvn assembly:assembly`), there are guava versions `r09` , `11.0.2`, and `15.0` at `~/.m2`. `Cassandra 2.0.7` should have been using `15.0`. – prayagupa Nov 23 '14 at 14:43
  • Could you run a dependency tree and post the output? – soitof Jan 23 '15 at 15:36
  • @soitof I've posted the output of `mvn dependency:tree`. – prayagupa Mar 18 '15 at 06:49
  • @PrayagUpd, have you resolved this issue , please see my question which is same due to version conflict : http://stackoverflow.com/questions/31603890/nosuchmethoderror-when-running-on-hadoop-but-not-when-run-locally – tom Jul 27 '15 at 07:50

1 Answers1

6

You are basically running into a version conflict. The problem goes like this,

  • Both hadoop native libraries and cassandra uses google guava.
  • But your hadoop version is using an older version of guava (11.xx) while your cassandra is update and uses guava 16.0. It is not very common for enterprise scale hadoop setups to update their environment with every new release.
  • cassandra config loader uses newConcurrentHashSet() method which is not present in your older version.
  • jars used by hadoop are always loaded before any third party jars. Hence even though a correct version of guava is present in your "with dependencies" jar, an older version of guava jar was being loaded from hadoop classpath and distributed to your mappers/reducers.

Solution:

  • Set the configuration parameter “mapreduce.job.user.classpath.first” to true in the run method of your Job :

    job.getConfiguration().set("mapreduce.job.user.classpath.first", "true");
    
  • Now, in your bin/hadoop , add the statement

    export HADOOP_USER_CLASSPATH_FIRST=true
    
    which will tell hadoop to load user defined libraries first. 
    
  • Make sure the latest version of your library is present in your hadoop classpath before the older one.
Vishnu Prathish
  • 369
  • 4
  • 15
  • Hey Vishnu Prathish, I followed your instruction but still the same thing. When I change to `hduser` and see `echo $HADOOP_CLASSPATH`, it's blank. – prayagupa Mar 12 '15 at 07:15
  • How are you running your job? paste the syntax here. Also, as a definite fix, : type 'hadoop classpath' , – Vishnu Prathish Mar 12 '15 at 13:58
  • First try updating your hadoop version. If that doesn't work, then if you are in a test environment, try this: type 'hadoop classpath' , in your cli. You should get a list of paths where the libraries are picked from. Navigate to each directory in there and replace your guava jar with version with 15 or higher. This is highly dangerous and not recommended. Make sure to back up your old jars as well. But you can try this if you are in a test environment. This makes sure that there are no old versions of guava in your environment. – Vishnu Prathish Mar 12 '15 at 14:06
  • Thank you Vishnu for the help. I actually create application `jar` with `maven` using normal user. Then switch to `hduser` and run the jar in target folder using command `hadoop jar target/doctorhere-engine-writer-1.0-jar-with-dependencies.jar /usr/hduser/shakespeare`. You can see the result of `hadoop classpath` above in question. – prayagupa Mar 13 '15 at 03:05
  • I see that hadoop dependencies for guava haven't been updated atleast in maven repo. Let's figure this out, 1) Are you running a test cluster? 2) Can you paste the output of "cat bin/hadoop" here. 3) Can you verify that the environment variable is being set for the job that you are running ? – Vishnu Prathish Mar 18 '15 at 15:25
  • (1) Yes, it is just test cluster in my local machine. (2) I have `bin/hadoop` script pasted in question (3) `echo $HADOOP_USER_CLASSPATH_FIRST` is still blank. As you suggested, I actually tried replacing `guava-11.0.2.jar` from `hadoop-2.2.0` to `guava-15.0.jar`, but get `java.lang.NoClassDefFoundError: org/apache/thrift/scheme/StandardScheme`. – prayagupa Mar 19 '15 at 00:14
  • - why would you get a no classdef found error for a thrift class while replacing guava ? Anhow let's not do that. Its not a recommended method anyways. - Why are you using two really old versions of guava? Add a maven dependency management in your pom.xml which resolves the guava version to 16 for compile scope. AFAIK newConcurrenthashset was moved from a very early version due to a bug and then later reintroduced. - Keep everything else and make sure you set both mapreduce.job.user.classpath.first in code and HADOOP_USER_CLASSPATH_FIRST in config. Let me know how it goes. – Vishnu Prathish Mar 19 '15 at 14:24
  • Yes, i have `("mapreduce.job.user.classpath.first", "true")` in code and `export HADOOP_USER_CLASSPATH_FIRST =true` in `bin/hadoop`. Though i find `echo $HADOOP_USER_CLASSPATH_FIRST` is blank. – prayagupa Mar 19 '15 at 20:29
  • `export HADOOP_USER_CLASSPATH_FIRST=true` in hadoop script is reflected nowhere, so when I set `export HADOOP_USER_CLASSPATH_FIRST=true` explicitly from terminal, I can't access the hdfs files from hadoop application. Like when I try to list the files from hduser, I get this error `'ls: /user/hduser/shakespeare/': No such file or directory`. If I unset the variable, things are normal again, but same guava issue. – prayagupa Mar 24 '15 at 04:01
  • " 'ls: /user/hduser/shakespeare/': No such file or directory " - is this a different error ? Did you add a latest guava version as a managed dependency in eclipse ? – Vishnu Prathish Mar 27 '15 at 20:06
  • Yes this occurs when I set `export HADOOP_USER_CLASSPATH_FIRST=true` from command line in ubuntu. I'm using maven and also tried excluding the `guava 9` from artifact `hector-core` I was using. So application now has only one guava dependency which is `guava:jar:15.0` used by `cassandra-all:jar`. Im not using eclipse btw, to build the application I'm using `mvn assembly:assembly` as you can see in the question. – prayagupa Mar 28 '15 at 02:30