You could use array_repeat
with explode
.(Spark2.4+)
For duplicate
:
from pyspark.sql import functions as F
df.withColumn("Name", F.explode(F.array_repeat("Name",2)))
For triplicate
:
df.withColumn("Name", F.explode(F.array_repeat("Name",3)))
For <spark2.4:
#duplicate
df.withColumn("Name", F.explode(F.array(*[['Name']*2])))
#triplicate
df.withColumn("Name", F.explode(F.array(*[['Name']*3])))
UPDATE:
In order to use another column Support
to replicate a certain number of times for each row
you could use this.(Spark2.4+)
df.show()
#+---+-------+-------+
#| ID| Name|Support|
#+---+-------+-------+
#| 1| John| 2|
#| 2| Maria| 4|
#| 3|Charles| 6|
#+---+-------+-------+
from pyspark.sql import functions as F
df.withColumn("Name", F.explode(F.expr("""array_repeat(Name,int(Support))"""))).show()
#+---+-------+-------+
#| ID| Name|Support|
#+---+-------+-------+
#| 1| John| 2|
#| 1| John| 2|
#| 2| Maria| 4|
#| 2| Maria| 4|
#| 2| Maria| 4|
#| 2| Maria| 4|
#| 3|Charles| 6|
#| 3|Charles| 6|
#| 3|Charles| 6|
#| 3|Charles| 6|
#| 3|Charles| 6|
#| 3|Charles| 6|
#+---+-------+-------+
For spark1.5+
, using repeat, concat, substring, split & explode.
from pyspark.sql import functions as F
df.withColumn("Name", F.expr("""repeat(concat(Name,','),Support)"""))\
.withColumn("Name", F.explode(F.expr("""split(substring(Name,1,length(Name)-1),',')"""))).show()