I'm trying to save a dataFrame into csv partition by id, for that I'm using spark 1.6 and scala. The function partitionBy("id") dont give me the right result.
My code is here :
validDf.write
.partitionBy("id")
.format("com.databricks.spark.csv")
.option("header", "true")
.option("delimiter", ";")
.mode("overwrite")
.save("path_hdfs_csv")
My Dataframe looks like :
-----------------------------------------
| ID | NAME | STATUS |
-----------------------------------------
| 1 | N1 | S1 |
| 2 | N2 | S2 |
| 3 | N3 | S1 |
| 4 | N4 | S3 |
| 5 | N5 | S2 |
-----------------------------------------
This code create 3 csv default partitions (part_0, part_1, part_2) not based on column ID.
What I expect is : getting sub dir or partition for each id. Any help ?