Suppose I have a POJO class named User
, and below is what I'm doing to put a list of User objects into a parquet file,
List<User> users = ...;
Dataset<Row> df = spark.createDataFrame(users, User.class);
df.write().parquet("...");
I can then simply read it via,
Dataset<Row> df = spark.read().parquet("...");
But what if I want spark to read it out as POJO objects, something like RDD<User>
, is it possible?
Most of the examples found on google is using Avro-support-for-parquet, but I don't use Avro here.
Any suggestions? Many thanks.
(I'm using Spark 2.4)