2

I am new to Pyspark Dataframe.

I have a pyspark dataframe which has a column which contains value in below format

Col1
a+
b+
a-
b-

I want to create another boolean column (Col2). Value for this column will be true if Col1 is having + in its value else it will be False.

I tried below code after research on Googleverse but it gave unexpected EOF while parsing error

DF = DF.withColumn("col2", F.when(DF.filter(DF.col1.like('+')), True).otherwise(False)

I also tried below code but that is also giving error Condition should be a column

df = DF.withColumn("col2", F.when(DF.filter("col1 like '%-%'")=="-", True).otherwise(False))

Please assist me on this

Mayank
  • 23
  • 1
  • 2
  • 9

2 Answers2

4

Since the + is always at the end in your sample, I assume that's the pattern we can rely on, so the easier (and probably faster) solution is using endswith

df.withColumn('col2', F.col('col1').endswith('+')).show()

# +----+-----+
# |col1| col2|
# +----+-----+
# |  a+| true|
# |  b+| true|
# |  a-|false|
# |  b-|false|
# +----+-----+
pltc
  • 5,836
  • 1
  • 13
  • 31
0

You don't need to use filter to scan each row of col1. You can just use the column's value inside when and try to match it with the %+ literal that indicates that you are searching for a + character at the very end of the String.

DF.withColumn("col2", when(col("col1").like("%+"), true).otherwise(false))

This will result in the following DataFrame:

+----+-----+
|col1| col2|
+----+-----+
|  a+| true|
|  b+| true|
|  a-|false|
|  d-|false|
+----+-----+

You can study more about the when/otherwise functionality here and here.

Coursal
  • 1,387
  • 4
  • 17
  • 32
  • You're welcome. Please tick to accept the answer for other people to know that the issue is solved. – Coursal May 09 '21 at 11:36
  • can you please tell me if i can check that the column name is present in the dataframe or not? the same query but only change is i need to check if the column is present or not.if yes that column should be added else null ..can you suggest solution for this? – Only developer Mar 17 '22 at 17:07
  • Hi @Vikram, it seems like you want to base the above functionality within a condition. the answer given here should give you what you need: https://stackoverflow.com/a/56291179/5644037 – Coursal Mar 17 '22 at 17:53
  • Hi @Coursal , The answer which you have suggested deals with the "column values". My requirement is to check the column name if it is present or not. For example in the solution which you have shared when column 'value' is present in a dataframe then that column should be returned to 'value_desc' if not 'null' should be returned. – Only developer Mar 18 '22 at 12:42