1

If I have 100GB of avro dataset and I have the same dataset in ORC that is 10GB. If I read in the ORC data in Spark, does it consume less memory than the Avro dataset?

I was thinking since all the data gets loaded into memory and deserialized, maybe there is no difference when doing larger transformations on the dataset, but wanted see if I'm thinking about it correctly.

Ryan
  • 1,102
  • 1
  • 15
  • 30

0 Answers0