1

I have a "Dataset(Row)" as below

+-----+--------------+
|val  |  history     |
+-----+--------------+
|500  |[a=456, a=500]|
|800  |[a=456, a=500]|
|784  |[a=456, a=500]|
+-----+--------------+

Here val is "String" and history is an "string array". I'm trying to add the content in val column to the history column, so that my dataset looks like :

+-----+---------------------+
|val  |  history            |
+-----+---------------------+
|500  |[a=456, b=500, c=500]|
|800  |[a=456, b=500, c=800]|
|784  |[a=456, b=500, c=784]|
+-----+---------------------+

A similar question is discussed here https://stackoverflow.com/a/49685271/2316771 , but I don't know scala and couldn't create a similar java solution.

Please help me to achieve this in java

Prasad Khode
  • 6,602
  • 11
  • 44
  • 59
DxG
  • 147
  • 4
  • 17

2 Answers2

4

In Spark 2.4 (not before), you can use the concat function to concat two arrays. In your case, you could do something like:

df.withColumn("val2", concat(lit("c="), col("val")))
  .select(concat(col("history"), array(col("val2")));

NB: the first time I use concat is to concat strings, the second time, to concat arrays. array(col("val2")) creates an array of one element.

Oli
  • 9,766
  • 5
  • 25
  • 46
0

I coded a solution but I'm not sure if it can be further optimized

    dataset.map(row -> {
        Seq<String> seq = row.getAs("history");
        ArrayList<String> list = new ArrayList<>(JavaConversions.seqAsJavaList(seq));
        list.add("c="+row.getAs("val"));

        return RowFactory.create(row.getAs("val"),list.toArray(new String[0]));},schema);
DxG
  • 147
  • 4
  • 17