I'm using the AvroKeyInputFormat
to read avro files:
val records = sc.newAPIHadoopFile[AvroKey[T], NullWritable, AvroKeyInputFormat[T]](path)
.map(_._1.datum())
Because I need to reflect over the schema in my job, I get the Avro schema like this:
val schema = records.first.getSchema
Unfortunately, this fails if the avro files in path
are empty (they include the writer schema, but no records).
Is there an easy way to only load the avro schema with Spark even if there are no records?