1

The following question contains solution to add headers in dataframe in Scala language. I want to add headers in a Dataset in java language. add header and column to dataframe spark

I am reading a file that does not contain headers:

Dataset<Row> ds= spark.read().format("csv").option("header", "false").load(filepath);

and ds.show() prints this:

+----------+----------------+----------+----+----+---------+----+
|       _c0|             _c1|       _c2| _c3| _c4|      _c5| _c6|
+----------+----------------+----------+----+----+---------+----+
|04/13/2019|             US1|04/13/2019|null|null|      abc|null|
|04/13/2019|             US1|04/13/2019|null|null|    qwert|null|
|04/13/2019|             US1|04/13/2019|null|null|     xyzz|null|
+----------+----------------+----------+----+----+---------+----+

The desired output is with my headers:

+----------+----------------+----------+----+----+---------+----+
| orderDate|          symbol|  sellDate| prc|  id|  product| cod|
+----------+----------------+----------+----+----+---------+----+
|04/13/2019|             US1|04/13/2019|null|null|      abc|null|
|04/13/2019|             US1|04/13/2019|null|null|    qwert|null|
|04/13/2019|             US1|04/13/2019|null|null|     xyzz|null|
+----------+----------------+----------+----+----+---------+----+

Can anyone please help in this regard?

user0204
  • 231
  • 3
  • 18

1 Answers1

1

I have found the answer to my question.

toDF() can be used to add headers as follows:

Dataset<Row> ds= spark.read().format("csv").option("header", "false").load(filepath).toDF("orderDate","symbol","selldate","prc","id","product","cod")

This can also be used to rename headers if they exists. like this:

Dataset<Row> ds= spark.read().format("csv").option("header", "true").load(filepath).toDF("orderDate","symbol","selldate","prc","id","product","cod")
user0204
  • 231
  • 3
  • 18