0

We can't figure out the following issue: we are trying to use Apache Hudi to save data to the storage. The problem is when we upload a fat jar which includes the org.json package in dependencies, the df.save() application is failing on

java.lang.NoClassDefFoundError: org/json/JSONException
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10847)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10047)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128)
    at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209)
    at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357)
    at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:262)
    at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:176)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:130)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
    at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321)
    at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363)
    at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359)

Even if I go to the cluster libraries and explicitly add this dependency it still fails on save. On the other hand, when I just create new JSONException("hello") in my notebook everything seem to work fine. What could cause this behaviour? Thanks

eugen-fried
  • 2,111
  • 3
  • 27
  • 48

3 Answers3

0

This is probably because the jar is not making it's way to the executor nodes, try addJar (https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#addJar-java.lang.String-)

dan_z
  • 13
  • 3
0

What version of Hudi are you using? There is a problem with JSON in version 0.6.0 and there is an opened issue. I suggest you to use version 0.5.2 by now.

0

Turns out that the problem was with different classpath between metastore service and spark process, because they run in separated JVM's. The problem was fixed in an init script that downloads the jar to the classpath folder.

eugen-fried
  • 2,111
  • 3
  • 27
  • 48