Spark 3.0 enables reading binary data using a new data source:
val df = spark.read.format(“binaryFile”).load("/path/to/data")
Using previous spark versions you cloud load data using:
val rdd = sc.binaryFiles("/path/to/data")
Beyond having the option to access binary data using the High-Level API (Dataset
) is there any additional benefits or features that spark 3.0 introduce with this feature?