I know I can do UUID.randomUUID.toString
to attach an id to each row in my Dataset
but I need this id to be a Long since I want to use GraphX. How do I do that in Spark? I know Spark has monotonically_increasing_id()
but that is only for the DataFrame API - what about for Datasets?
Asked
Active
Viewed 3,587 times
2

pathikrit
- 32,469
- 37
- 142
- 221
-
You should still be able to use `monotonically_increasing_id()`. Sure, you will get a dataframe back but does that matter? Dataframes and datasets can usually be used interchangeably.. If it does matter, can you give some more information about this specific case? – Shaido Oct 19 '17 at 02:14
1 Answers
0
We can do this by dropping into dataframes:
case class Row(id: Long, name: String .....)
val ds: Dataset[Row] = ....
val ds2 = ds.withColumn("id", monotonically_increasing_id()).as[Row]

pathikrit
- 32,469
- 37
- 142
- 221