Pyspark set values with conditions

Question

I have this pyspark dataframe

data = [['tom', 1], ['nick', 1], ['juli', 2]]
  
df = pd.DataFrame(data, columns=['Name', 'stat'])
  
df= spark.createDataFrame(df)

I need to make this transformation (if stat ==1 then Name = "toto")

I mean , I want to have this dataframe after transformation

data = [['toto', 1], ['toto', 1], ['juli', 2]]
  
df = pd.DataFrame(data, columns=['Name', 'stat'])
  
df= spark.createDataFrame(df)

thanks in advance

Does this answer your question? [Apache spark dealing with case statements](https://stackoverflow.com/questions/39982135/apache-spark-dealing-with-case-statements) — samkart, Sep 13 '22 at 05:29

Luiz Viola · Answer 1 · 2022-09-12T06:46:44.667

You just need a WithColumn and when function:

from pyspark.sql import functions as F

df = spark.createDataFrame(
    [
    ('tom','1'),
    ('nick','1'),
    ('juli','2')
    ],
    ['Name','stat']
)

df.withColumn('Name', F.when(F.col('stat') == '1',
                             F.lit('toto')
                            ).otherwise(F.col('Name'))).show()

# +----+----+
# |Name|stat|
# +----+----+
# |toto|   1|
# |toto|   1|
# |juli|   2|
# +----+----+

Pyspark set values with conditions

1 Answers1