2

I know I can do UUID.randomUUID.toString to attach an id to each row in my Dataset but I need this id to be a Long since I want to use GraphX. How do I do that in Spark? I know Spark has monotonically_increasing_id() but that is only for the DataFrame API - what about for Datasets?

pathikrit
  • 32,469
  • 37
  • 142
  • 221
  • You should still be able to use `monotonically_increasing_id()`. Sure, you will get a dataframe back but does that matter? Dataframes and datasets can usually be used interchangeably.. If it does matter, can you give some more information about this specific case? – Shaido Oct 19 '17 at 02:14

1 Answers1

0

We can do this by dropping into dataframes:

case class Row(id: Long, name: String .....)

val ds: Dataset[Row] = ....

val ds2 = ds.withColumn("id", monotonically_increasing_id()).as[Row]
pathikrit
  • 32,469
  • 37
  • 142
  • 221