0

When I try reading data from an avro table using spark-sql, I am getting this error.

Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories(AvroObjectInspectorGenerator.java:142)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:91)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:121)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspector(AvroObjectInspectorGenerator.java:83)
        at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.<init>(AvroObjectInspectorGenerator.java:56)

This is my sbt file

val sparkVersion = "2.4.2"

libraryDependencies ++=  Seq(
  "org.apache.spark" %% "spark-sql" % sparkVersion
)

libraryDependencies += "com.databricks" %% "spark-avro" % "4.0.0"

Do I need to add any dependencies? THe code works fine in hive, but spark is having issues.

Srinivas
  • 2,010
  • 7
  • 26
  • 51
  • Please share the code – dassum Nov 11 '19 at 14:17
  • 2
    please try with libraryDependencies += "org.apache.spark" %% "spark-avro" % "2.4.0" – dassum Nov 11 '19 at 14:20
  • Hello @dassum, I tried this and it seems fine. But Now I am getting this error. Caused by: MetaException(message:java.lang.ClassNotFoundException Class not found) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:442) at org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:250) – Srinivas Nov 12 '19 at 15:41
  • You need to always use dependency versions that match your Spark version. Change it to 2.4.2 or reuse your variable – OneCricketeer Nov 29 '19 at 15:19
  • @cricket_007, I tried with pyspark --driver-memory 10g --jars /tmp/spark-avro_2.11-2.4.2.jar still getting the same error. It is happening when I am trying to join a table's data with the avro schema. Trying to see if a simple filter does it too. – Srinivas Nov 29 '19 at 17:34
  • You shouldn't need Spark avro if you're reading avro data from Hive anyway since Hive server handles that translation. Have you tried using any other tool to query the same table? – OneCricketeer Nov 29 '19 at 20:06
  • Yes querying the table with hive works fine. – Srinivas Dec 01 '19 at 05:53

0 Answers0