How to Read a large avro file

Asked Dec 29 '16 at 00:11

Active Dec 29 '16 at 00:16

Viewed 631 times

I am trying to read a large avro file (2GB) using spark-shell but I am getting stackoverflow error.

val newDataDF = spark.read.format("com.databricks.spark.avro").load("abc.avro")
java.lang.StackOverflowError
  at com.databricks.spark.avro.SchemaConverters$.toSqlType(SchemaConverters.scala:71)
  at com.databricks.spark.avro.SchemaConverters$.toSqlType(SchemaConverters.scala:81)

I tried to increase driver memory and executor memory but I am still getting same error.

./bin/spark-shell --packages com.databricks:spark-avro_2.11:3.1.0 --driver-memory 8G --executor-memory 8G

How can I read this file ? Is theere a way to partition this file?

edited Dec 29 '16 at 00:16

evan.oman

5,922
22
43

asked Dec 29 '16 at 00:11

PrinceChamp

That exception is related to schema/datatype conversion. So, increase memory won't help. Can you add full exception ? – mrsrinivas Dec 29 '16 at 01:34

How to Read a large avro file

0 Answers0