We are running a spark streaming job to read from Kafka, convert to csv and then write to Hbase. I am using the API CSVBulkLoad to run a bulk load job. The spark job starts fine and converts to CSV but the csvBulkLoad.run() starts a new MR job but fails with this exception
Error: java.lang.ClassNotFoundException: org.apache.commons.csv.CSVFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.<init>(CsvToKeyValueMapper.java:96)
at org.apache.phoenix.mapreduce.CsvToKeyValueMapper.setup(CsvToKeyValueMapper.java:69)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1865)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
This is my spark job command
spark-submit --jars $(echo lib/*.jar | tr ' ' ',') --class "com.lam.app.Driver" --name "MyBulkLoader" --master yarn --deploy-mode cluster --driver-class-path $(echo lib/*.jar | tr ' ' ',') --driver-memory 4g --executor-memory 1g --num-executors 2 --executor-cores 2 --files conf.properties PipeLine-1.0.jar
Earlier I used to get another error about missing class (java.lang.NoClassDefFoundError: com/yammer/metrics/core/MetricsRegistry) but it got resolved after including "--driver-class-path $(echo lib/*.jar | tr ' ' ',') ". Now we are getting this error about missing CSVFormat class.
My lib directory has all the jar files needed including the commons-csv jar but still I am getting this error. I use maven to build the project. Below is the code snippet for running the csvbulkload
CsvBulkLoadTool csvBulkLoadTool = new CsvBulkLoadTool();
final org.apache.hadoop.conf.Configuration conf = HBaseConfiguration.create();
System.out.println("ZKQuorum " + this.ZKQUORUM);
conf.set(HConstants.ZOOKEEPER_QUORUM, this.ZKQUORUM);
conf.set("hbase.zookeeper.property.clientPort", String.valueOf(2181));
conf.set("zookeeper.znode.parent", "/hbase-unsecure");
csvBulkLoadTool.setConf(conf);
int exitCode = csvBulkLoadTool.run(new String[] { "--input", "\"" + this.hdfsInputFile + "\"",
"--table", this.TABLENAME, "--zookeeper", this.ZKQUORUM + ":2181:/hbase-unsecure" });
System.out.println("Return code of WDL bulk load execution " + exitCode);
My exitCode is -1 always. Please let me know if i am missing something in my class path