How to generate a GUID id column in Spark that is of integer type

Question

I know I can do UUID.randomUUID.toString to attach an id to each row in my Dataset but I need this id to be a Long since I want to use GraphX. How do I do that in Spark? I know Spark has monotonically_increasing_id() but that is only for the DataFrame API - what about for Datasets?

You should still be able to use `monotonically_increasing_id()`. Sure, you will get a dataframe back but does that matter? Dataframes and datasets can usually be used interchangeably.. If it does matter, can you give some more information about this specific case? — Shaido, Oct 19 '17 at 02:14

score 0 · Answer 1 · answered Oct 19 '17 at 13:57

0

We can do this by dropping into dataframes:

case class Row(id: Long, name: String .....)

val ds: Dataset[Row] = ....

val ds2 = ds.withColumn("id", monotonically_increasing_id()).as[Row]

answered Oct 19 '17 at 13:57

pathikrit

32,469
37
142
221

How to generate a GUID id column in Spark that is of integer type

1 Answers1