Yes, you can have Dataset of your own class. It Would look like Dataset<MyOwnClass>
In the code below I have tried to read a file content and put it in the Dataset of the class that we have created. Please check the snippet below.
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoder;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SparkSession;
import java.io.Serializable;
public class FileDataset {
public static class Employee implements Serializable {
public int key;
public int value;
}
public static void main(String[] args) {
// configure spark
SparkSession spark = SparkSession
.builder()
.appName("Reading JSON File into DataSet")
.master("local[2]")
.getOrCreate();
final Encoder<Employee> employeeEncoder = Encoders.bean(Employee.class);
final String jsonPath = "/Users/ajaychoudhary/Documents/student.txt";
// read JSON file to Dataset
Dataset<Employee> ds = spark.read()
.json(jsonPath)
.as(employeeEncoder);
ds.show();
}
}
The content of my student.txt
file is
{ "key": 1, "value": 2 }
{ "key": 3, "value": 4 }
{ "key": 5, "value": 6 }
It produces the following output on the console:
+---+-----+
|key|value|
+---+-----+
| 1| 2|
| 3| 4|
| 5| 6|
+---+-----+
I hope this gives you an initial idea of how you can have the dataset of your own custom class.