9

In Java, I use RowFactory.create() to create a Row:

Row row = RowFactory.create(record.getLong(1), record.getInt(2), record.getString(3));

where "record" is a record from a database, but I cannot know the length of "record" in advance, so I want to use a List or an Array to create the "row". In Scala, I can use Row.fromSeq() to create a Row from a List or an Array, but how can I achieve that in Java?

user2736706
  • 103
  • 1
  • 1
  • 5

4 Answers4

17

We often need to create Datasets or Dataframes in real world applications. Here is an example of how to create Rows and Dataset in a Java application:

// initialize first SQLContext
SQLContext sqlContext = ... 
StructType schemata = DataTypes.createStructType(
        new StructField[]{
                createStructField("NAME", StringType, false),
                createStructField("STRING_VALUE", StringType, false),
                createStructField("NUM_VALUE", IntegerType, false),
        });
Row r1 = RowFactory.create("name1", "value1", 1);
Row r2 = RowFactory.create("name2", "value2", 2);
List<Row> rowList = ImmutableList.of(r1, r2);
Dataset<Row> data = sqlContext.createDataFrame(rowList, schemata);
+-----+------------+---------+
| NAME|STRING_VALUE|NUM_VALUE|
+-----+------------+---------+
|name1|      value1|        1|
|name2|      value2|        2|
+-----+------------+---------+
Paul Roub
  • 36,322
  • 27
  • 84
  • 93
Andrushenko Alexander
  • 1,839
  • 19
  • 14
  • @thank you , in scala we will do sc.paralallize(List((x,y),(a,b))).toDF("col1","col2"), it is so simple , why these Row , JavaRDD and etc ? any simple way like that ? – BdEngineer May 31 '19 at 10:11
  • You are saying you need to create Dataset in real world applications and making a hard definition of the all variables. Does not make any sense. In the real world everything has to be parameterizable and beforehand you do not know the values. – Borja Oct 11 '19 at 08:41
12

I am not sure if I get your question correctly but you can use the RowFactory to create Row from ArrayList in java.

List<MyData> mlist = new ArrayList<MyData>();
    mlist.add(d1);
    mlist.add(d2);

Row row = RowFactory.create(mlist.toArray());   
abaghel
  • 14,783
  • 2
  • 50
  • 66
  • hi, when I use your method, I found spark regard mlist as a whole object: `Row row = RowFactory.create(mlist);` `System.out.println("row number:" + row.length());` `System.out.println("mlist number:" + mlist.size());` I got: row number:1 mlist number:2 – user2736706 Sep 26 '16 at 08:25
  • Yes but Row will have both records.You can try printing System.out.println("row number:" + row.toSeq()); – abaghel Sep 26 '16 at 08:38
  • 1
    hi, thanks so much! And you can try this: Object[] rowArray = {obj1, obj2, ....} Row row = RowFactory.create(rowArray); System.out.println("row number:" + row.length()); You will get - row number:6 – user2736706 Sep 26 '16 at 11:20
  • Thanks. I updated my answer. I checked the source code for RowFactory and GenericRow class.-"An internal row implementation that uses an array of objects as the underlying storage." – abaghel Sep 26 '16 at 11:54
0

//Create a a list of DTO

List<MyDTO> dtoList = Arrays.asList(.....));

//Create a Dataset of DTO

Dataset<MyDTO> dtoSet = sparkSession.createDataset(dtoList,
                Encoders.bean(MyDTO.class));

//If you need dataset of Row

Dataset<Row> rowSet= dtoSet .select("col1","col2","col3");
Sanjay Singh
  • 957
  • 10
  • 8
-1

For simple list values you can use Encoders:

 List<Row> rows = ImmutableList.of(RowFactory.create(new Timestamp(currentTime)));
 Dataset<Row> input = sparkSession.createDataFrame(rows, Encoders.TIMESTAMP().schema());
Alex Stanovsky
  • 1,286
  • 1
  • 13
  • 28