1

I have trouble trying out the Kafka-HDFS data ingestion example .

I have tried both 0.10.0 and 0.14.0 version. For the 0.10.0 version i use the ready distribution and for the 0.14.0 version i made a build by myself following the instructions in the README file (one problem that i encountered here was that the scala-library downloaded by gradle was not the one listed in the dependencies and i had to manually download scala-library-2.11.8 and put it in under the lib/ directory).

I have used hadoop server version 2.3.0 (with a pseudo distributed single node setup) and 3.2.1. One problem that i have with the 2.3.0 version i use as a cluster is that the downloaded release from the apache hadoop page is build against java < 1.8 in 32-bit mode whereas the gobblin libraries require java >= 1.8 and then i get some weird errors regarding stack guards etc.

Nevertheless, i prefer to go with hadoop 3.2.1 for my pseudo-distributed single node setup. So i went to try that. Using the ready distribution for 0.10.0 (gobblin-distribution-0.10.0.tar.gz) i follow the instructions here: https://github.com/apache/incubator-gobblin/blob/gobblin_0.10.0/gobblin-docs/case-studies/Kafka-HDFS-Ingestion.md

then i execute:

bin/gobblin-mapreduce.sh --conf ~/Gobblin/ap       ps/KafkaHDFSIngestionMapReduce/job_conf_dir/job.pull

And i got this error in the log gobblin-current.log:

2020-02-18 10:42:18 UTC ERROR [main] gobblin.runtime.AbstractJobLauncher  442 - Failed to launch and run job job_GobblinKafkaQuickStart_1582022486716: java.lang.NoSuchMethodError: org.apache.hadoop.yarn.api.records.URL.fromPath(Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/yarn/api/records/URL;
java.lang.NoSuchMethodError: org.apache.hadoop.yarn.api.records.URL.fromPath(Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/yarn/api/records/URL;
    at org.apache.hadoop.mapred.YARNRunner.setupLocalResources(YARNRunner.java:393)
    at org.apache.hadoop.mapred.YARNRunner.createApplicationSubmissionContext(YARNRunner.java:573)
    at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:325)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
    at gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:230)
    at gobblin.runtime.AbstractJobLauncher.runWorkUnitStream(AbstractJobLauncher.java:570)
    at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:417)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.launchJob(CliMRJobLauncher.java:89)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:66)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:111)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

One thing i tried was to change bin/gobblin-mapreduce.sh and add to LIBJARS that are passed as -libjars to hadoop jar all the jars under the lib/ folder so that the jobs use the 2.3.0 versions provided there (i.e hadoop-yarn-api-2.3.0 for the URL class in the error above). After that change i still get the same error.

Notes: 1. HADOOP_HOME and HADOOP_BIN_DIR are only set before calling bin/gobblin-mapreduce.sh that point to my hadoop 3.2.1 installation.

Do you have any suggestions in tackling this problem? If using hadoop version 3.2.1 for the cluster setup is not possible what 2.X version could i use (that would not require to prepare a full development environment just to make a java 1.8 64-bit build :) )???

Alternatively, has anyone of you successfully tried the example in the page and could you please list the versions used?

Thanks for your time and assistance!

webprogrammer
  • 2,393
  • 3
  • 21
  • 27

1 Answers1

0

Seems like you have mixed versions of hadoop dependencies. The org.apache.hadoop.yarn.api.records.URL comes from 2+, but the YARNRunner comes from 3+.

What is the error when you use 0.14.0?