1

I am new to Hadoop/Giraph and Java. As part of a task, I downloaded Cloudera Quickstart VM and Giraph on top of it. I am using this book named "Practical Graph Analytics with Apache Giraph; Authors: Shaposhnik, Roman, Martella, Claudio, Logothetis, Dionysios" from which I tried to run the first example on Page 111 (Twitter Followership Graph).

Edit: Apparently the examples in the book (published in 2015) depend on a significantly older version of Hadoop than the one provided by current (2017) versions of Cloudera Quickstart VM. How do I go about getting the examples to run?

Original Post:

Ran the GiraphHelloWorld.java program

import org.apache.giraph.edge.Edge;
import org.apache.giraph.GiraphRunner;
import org.apache.giraph.graph.BasicComputation;
import org.apache.giraph.graph.Vertex;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.util.ToolRunner;

// Giraph applications are custom classes that typically use
// BasicComputation class for all their defaults... except for
// the compute method that has to be defined

public class GiraphHelloWorld extends
BasicComputation<IntWritable, IntWritable,
NullWritable, NullWritable> {
    @Override
    public void compute(Vertex<IntWritable, IntWritable, NullWritable> vertex, Iterable<NullWritable> messages) {
        System.out.print("Hello world from the: " + vertex.getId().toString() + " who is following:");

        // iterating over vertex's neighbors
        for (Edge<IntWritable, NullWritable> e : vertex.getEdges()) {
            System.out.print(" " + e.getTargetVertexId());
        }
        System.out.println("");

        // signaling the end of the current BSP computation for the current vertex
        vertex.voteToHalt();
    }
    public static void main(String[] args) throws Exception {
        System.exit(ToolRunner.run(new GiraphRunner(), args));
    }
}

The below code ran on the terminal to execute the program:

export HADOOP_HOME=/usr/lib/hadoop
export GIRAPH_HOME=/usr/local/giraph
export HADOOP_CONF_DIR=$GIRAPH_HOME/conf
PATH=$HADOOP_HOME/bin:$GIRAPH_HOME/bin:$PATH

giraph target/book-examples-1.0.0-jar-with-dependencies.jar GiraphHelloWorld -vip /home/cloudera/src/main/resources/1 -vif org.apache.giraph.io.formats.IntIntNullTextInputFormat -w 1 -ca giraph.SplitMasterWorker=false,giraph.logLevel=error

The above resulted in the below error:

rker=false,giraph.logLevel=error
No lib directory, assuming dev environment
HADOOP_CONF_DIR=/usr/local/giraph/conf
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/cloudera/workspace/first/target/book-examples-1.0.0-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2017-12-08 16:46:24,917 INFO  [main] utils.ConfigurationUtils (ConfigurationUtils.java:populateGiraphConfiguration(336)) - No edge input format specified. Ensure your InputFormat does not require one.
2017-12-08 16:46:24,926 INFO  [main] utils.ConfigurationUtils (ConfigurationUtils.java:populateGiraphConfiguration(346)) - No vertex output format specified. Ensure your OutputFormat does not require one.
2017-12-08 16:46:24,926 INFO  [main] utils.ConfigurationUtils (ConfigurationUtils.java:populateGiraphConfiguration(361)) - No edge output format specified. Ensure your OutputFormat does not require one.
2017-12-08 16:46:24,957 INFO  [main] utils.ConfigurationUtils (ConfigurationUtils.java:populateGiraphConfiguration(402)) - Setting custom argument [giraph.SplitMasterWorker] to [false] in GiraphConfiguration
2017-12-08 16:46:24,957 INFO  [main] utils.ConfigurationUtils (ConfigurationUtils.java:populateGiraphConfiguration(402)) - Setting custom argument [giraph.logLevel] to [error] in GiraphConfiguration
2017-12-08 16:46:25,329 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1175)) - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2017-12-08 16:46:25,330 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1175)) - mapred.job.map.memory.mb is deprecated. Instead, use mapreduce.map.memory.mb
2017-12-08 16:46:25,330 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1175)) - mapred.job.reduce.memory.mb is deprecated. Instead, use mapreduce.reduce.memory.mb
2017-12-08 16:46:25,330 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1175)) - mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
2017-12-08 16:46:25,332 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1175)) - mapreduce.user.classpath.first is deprecated. Instead, use mapreduce.job.user.classpath.first
2017-12-08 16:46:25,332 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1175)) - mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
2017-12-08 16:46:25,336 INFO  [main] job.GiraphJob (GiraphJob.java:run(226)) - run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
2017-12-08 16:46:25,339 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1175)) - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2017-12-08 16:46:25,401 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1175)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2017-12-08 16:46:25,405 INFO  [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
    at org.apache.giraph.bsp.BspOutputFormat.checkOutputSpecs(BspOutputFormat.java:43)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:270)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
    at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:259)
    at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:94)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Maven pom xml file:

<?xml version="1.0" encoding="UTF-8"?>
<project>
<modelVersion>4.0.0</modelVersion>

<groupId>giraph</groupId>
<artifactId>book-examples</artifactId>
<version>1.0.0</version>

<dependencies>
<dependency>
<groupId>org.apache.giraph</groupId>
<artifactId>giraph-core</artifactId>
<version>1.1.0</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.9.0</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.4</version>
                <executions>
                    <execution>
                        <id>create-jar-bundle</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                        <configuration>
                            <descriptorRefs>
                                <descriptorRef>jar-with-dependencies</descriptorRef>
                            </descriptorRefs>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
</plugins>
</build>

<repositories>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
            <releases>
                <enabled>true</enabled>
            </releases>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
    </repositories>

</project>

Please let me know if anything else is needed. Appreciate your help, Thanks in advance!

cowbert
  • 3,212
  • 2
  • 25
  • 34
  • Initial guess: there is a major version incompatibility between the version of Haddop and Giraph you installed and the version the code in the book is based on.... I'd double check the version they used for the code in the book (should be in the intro somewhere) – cowbert Dec 09 '17 at 01:19
  • You are absolutely right! The version used in the book is hadoop 1.2.1 and the version of hadoop cluster is >2.0 –  Dec 09 '17 at 02:09
  • @cowbert Any idea why it would behave this way ? –  Dec 11 '17 at 14:58

1 Answers1

0

The issue with versions was resolved when I tried to create my own pom file with the dependencies that were needed for the Giraph Project.

`

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com</groupId>
  <artifactId>R4.giraphshortestpath</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>R4.giraphshortestpath</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

    <repositories>
        <repository>
            <id>cloudera</id>
            <name>cloudera repository</name>
            <url>https://repository.cloudera.com/content/repositories/releases/</url>
        </repository>
    </repositories>

    <dependencies>
        <dependency>
            <groupId>org.apache.giraph</groupId>
            <artifactId>giraph-parent</artifactId>
            <version>1.2.0-hadoop2</version>
            <type>pom</type>
        </dependency>

        <dependency>
            <groupId>org.apache.giraph</groupId>
            <artifactId>giraph-core</artifactId>
            <version>1.2.0-hadoop2</version>
        </dependency>


        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.6.0-cdh5.12.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
    <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.6.0-mr1-cdh5.12.0</version>
    </dependency>

    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.4</version>
                <executions>
                    <execution>
                        <id>create-jar-bundle</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                        <configuration>
                            <descriptorRefs>
                                <descriptorRef>jar-with-dependencies</descriptorRef>
                            </descriptorRefs>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

`