0

i have below data frame in which i am trying to create a new column by concatinating name from a list

df=

----------------------------------
| name| department|  state| id| hash
------+-----------+-------+---+----
James|  Sales1   |null   |101|4df2
Maria|  Finance  |       |102|5rfg
Jen  |           |NY2    |103|234

key_list=['name','state','id']

df.withColumn('prim_key', concat(*key_list)
df.show()

but above return the same result

----------------------------------
| name| department|  state| id| hash
------+-----------+-------+---+----
James|  Sales1   |null   |101|4df2
Maria|  Finance  |       |102|5rfg
Jen  |           |NY2    |103|234


i suspecting it might be due to space in the column names in DF. so i used trim to remove all space in column names, but no luck . it returning the same result

Any solution to this?

Adhi cloud
  • 39
  • 6
  • 1
    could be typo but did you assign the result of `withColumn` back to `df` (`df = df.withColumn()`)? – Emma Jun 03 '22 at 19:21
  • Hi @Emma yes.. i assigned as you mentioned.. i am wondering what could be the reason? i checked with another example of df locally without have any space in column name, and that works fine.. but in this case, these df are derived from a database.. – Adhi cloud Jun 03 '22 at 20:14
  • did df.show() now display the expected result? – Emma Jun 03 '22 at 20:16
  • no as said its not creating new column 'prim_key' with concatination of those columns – Adhi cloud Jun 03 '22 at 20:17
  • could you try `df = df.withColumn('prim_key', lit(1))` and `df.show()`? do you see `prim_key` column? or do you see any errors? – Emma Jun 03 '22 at 20:22

1 Answers1

0

i found it... the issue was due to assigning the result to new or existing df

df=df.withColumn('prim_key', concat(*key_list)
Adhi cloud
  • 39
  • 6