0

I have two Dataframe and I want when the column CONCERN of Dataframe2 contains 'all' the anwser in the new column "EFFECTIFITY" (in the same dataframe) is a list off all the serial number "SN" of the column "SN" in the Dataframe1

df1 = Dataframe1 df2 = Dataframe2

all_data = df1.select(collect_list("SN")).show()

df = df.withColumn("EFFECTIVITY", F.when(df2.CONCERN.contains('ALL'), all_data).otherwise(''))

SISI
  • 1
  • 1

1 Answers1

0

check below scenario. it may solve your problem,

from pyspark.sql.functions import collect_list, when

# create list and collect all the SN values from df1 into a list
all_data = df1.select(collect_list("SN")).first()[0]

df2 = df2.withColumn("EFFECTIVITY", when(df2.CONCERN.contains('ALL'), all_data).otherwise([]))
Mikey
  • 5
  • 2