Is there another/similiar method for sparks.read.format.load outisde of databricks?

Question

I am trying to load an avro file into a sparks dataframe so I can convert it to a pandas and eventually a dictionary. The method I want to use:

df = spark.read.format("avro").load(avro_file_in_memory)

(Note: the avro file data I'm trying to load into the dataframe is already in memory as a response from a request response from python requests)

However, this function uses sparks native to databricks environment, which I am not working in (I looked into pysparks for a similar function/code but could not see anything myself).

Is there any function similar that I can use outside of data bricks to produce the same results?

OneCricketeer · Answer 1 · 2019-06-08T01:53:19.260

That Databricks library is open source, but was actually added to core Spark in 2.4 (though still an external library)

In any case, there's a native avro Python library, as well as fastavro, so I'm not entirely sure if you want to be starting up a JVM (because you're using Spark), just to load Avro data into a dictionary. Besides that, an Avro file consists of multiple records, so it would at the very least be a list of dictionaries

Basically, I think you're better off using the approach from your previous question, but start with writing the Avro data to disk, since that seems to be your current issue

Otherwise, maybe a little more searching for what you're looking for would solve this XY problem you're having

https://github.com/ynqa/pandavro

Is there another/similiar method for sparks.read.format.load outisde of databricks?

1 Answers1