0

Suppose I have a POJO class named User, and below is what I'm doing to put a list of User objects into a parquet file,

List<User> users = ...;
Dataset<Row> df = spark.createDataFrame(users, User.class);
df.write().parquet("...");

I can then simply read it via,

Dataset<Row> df = spark.read().parquet("...");

But what if I want spark to read it out as POJO objects, something like RDD<User>, is it possible?

Most of the examples found on google is using Avro-support-for-parquet, but I don't use Avro here.

Any suggestions? Many thanks.

(I'm using Spark 2.4)

gfytd
  • 1,747
  • 2
  • 24
  • 47
  • 1
    [This](https://stackoverflow.com/questions/34654145/how-to-convert-dataframe-to-dataset-in-apache-spark-in-java) might help – blackbishop Jan 24 '22 at 13:10
  • 1
    You can just read a Dataset using `spark.read.parquet(...).as[User]`. https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html – mazaneicha Jan 24 '22 at 15:50

0 Answers0