0

I am trying to submit distCP job from a spring boot application on a REST API call.

version of spring: 1.5.13.RELEASE hadoop version: 2.7.3

below is the code I am using to instantiate the DistCP:

List<Path> srcPathList = new ArrayList<Path>();
srcPathList.add(new Path("hdfs://<cluster>/tmp/<user>/source"));

Path targetPath = new Path("hdfs://<cluster>/tmp/<user>/destination");

DistCpOptions distCpOptions = new DistCpOptions(srcPathList,targetPath);
DistCp distCp = new DistCp(configuration,distCpOptions);
Job job = distCp.execute();

The job is submitted successfully to the cluster, however the job fails due to ClassNotFoundException on the cluster. Below is the exception:

INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; 
cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException:  
java.lang.RuntimeException: java.lang.ClassNotFoundException: 
Class org.apache.hadoop.tools.mapred.CopyOutputFormat not found

Why does this happen? Any pointers around this would be very helpful!! Thanks!

sRey
  • 1
  • 1
  • How are you getting hadoop dependencies into your app? What version of Hadoop is running in YARN? – OneCricketeer Sep 27 '18 at 12:29
  • Thank you for the response! Yarn is also running on 2.7.3, the project is built with maven, so i have maven dependencies. Am i missing something obvious? – sRey Sep 27 '18 at 16:56
  • Well, that class is definitely part of the Apache 2.7.3 release. https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyOutputFormat.java Perhaps the classpath of your YARN containers is not being set correctly – OneCricketeer Sep 27 '18 at 21:16
  • hmmm..how could I check the set up on yarn? – sRey Sep 27 '18 at 22:43
  • You can find a `yarn.application.classpath` property in the yarn-site XML file. There is a default that should be pulling in `HADOOP_HDFS_HOME` environment variable Java classes, though, and if that property isn't set, then the problem starts with `hadoop-env` file – OneCricketeer Sep 27 '18 at 23:06
  • Generally, though, unless you have a lot of data, you should just use the `FileSystem` class and the copy command rather than DistCp trying to run MapReduce job – OneCricketeer Sep 27 '18 at 23:07
  • i have added the yarn-site in the config before starting DistCP. Below is the property set: yarn.application.classpath $HADOOP_CONF_DIR,/usr/hdp/2.6.5.0-292/hadoop/*, /usr/hdp/2.6.5.0-292/hadoop/lib/*, /usr/hdp/current/hadoop-hdfs-client/*, /usr/hdp/current/hadoop-hdfs-client/lib/*, /usr/hdp/current/hadoop-yarn-client/*, /usr/hdp/current/hadoop-yarn-client/lib/* However i have to use DistCP as the files to be copied across clusters are very huge! – sRey Sep 28 '18 at 04:52
  • Generally, I believe `/usr/hdp/current/hadoop-hdfs-client/lib` contains that class in the error – OneCricketeer Sep 28 '18 at 14:42
  • oh ok...which means i could check if that class is there in the lib on the node the job is running? – sRey Sep 28 '18 at 16:52
  • Yeah, that's what I would do. – OneCricketeer Sep 28 '18 at 19:19
  • sure..thanks so much!! – sRey Sep 29 '18 at 05:13

1 Answers1

0

I found the reason via viewing the job.jar on the NodeManager machine. The structure of job.jar is:

BOOT-INF/class/xxx

this is unreasonable.

I tried to replace the jar package with war,it works!

<packaging>war</packaging>

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<!--exclude inner tomcat-->
    <exclusions>
        <exclusion>
            <artifactId>spring-boot-starter-tomcat</artifactId>
            <groupId>org.springframework.boot</groupId>
        </exclusion>
    </exclusions>
</dependency>
<!-- include tomcat-->
<dependency>
    <groupId>org.apache.tomcat</groupId>
    <artifactId>tomcat-servlet-api</artifactId>
    <version>7.0.47</version>
    <scope>provided</scope>
</dependency>
...

and then add start class:

import org.springframework.boot.builder.SpringApplicationBuilder;
import org.springframework.boot.web.support.SpringBootServletInitializer;

public class SpringBootStartApplication extends SpringBootServletInitializer {

    @Override
    protected SpringApplicationBuilder configure(SpringApplicationBuilder builder) {
        // 
        return builder.sources(xxxPortalApplication.class);
    }
}