0

When i query in Apache drill for an .avro file, i'm getting the Body column values correctly as shown below snapshot. But if i do the same in Spark-SQL, Body column values are coming in a binary format. Is there a way i can read the data correctly in Spark-SQL. Attached both the images one which Apache drill is able to read the body column without any issues where as Apache spark is reading the body column in terms of binary format. Any help would be much appreciated.

Apache drill image..

Apache drill is able to read the Body column values correctly

Spark-SQL image..

spark-sql is reading the Body column values in a binary format

avroDF.printSchema
root
 |-- SequenceNumber: long (nullable = true)
 |-- Offset: string (nullable = true)
 |-- EnqueuedTimeUtc: string (nullable = true)
 |-- SystemProperties: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- member0: long (nullable = true)
 |    |    |-- member1: double (nullable = true)
 |    |    |-- member2: string (nullable = true)
 |    |    |-- member3: binary (nullable = true)
 |-- Properties: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- member0: long (nullable = true)
 |    |    |-- member1: double (nullable = true)
 |    |    |-- member2: string (nullable = true)
 |    |    |-- member3: binary (nullable = true)
 |-- Body: binary (nullable = true)
Anil Kumar
  • 525
  • 6
  • 27
  • it would be nice if you share your code. how r you loading file in spark ? – maogautam Aug 30 '19 at 21:12
  • @Prateek Below is the code i was running: val avroDF = spark.read.format("com.databricks.spark.avro").option("header","true").option("inferSchema","true").load("/qmctdl/46.avro") avroDF: org.apache.spark.sql.DataFrame = [SequenceNumber: bigint, Offset: string ... 4 more fields] – Anil Kumar Aug 31 '19 at 02:05
  • also, added schema on the main question. I will need to concentrate more on the Body column values. Is there a way i can read the Body columns values similar to the way Apache drill is currently reading. – Anil Kumar Aug 31 '19 at 02:11

0 Answers0