-2

I have below code in Python but I need to convert this to pyspark,

qm1['c1'] = [x[0] in x[1] for x in zip(qm1['id'], qm1['question'])]
qm1['c1'] = qm1['c1'].astype(str)
qm1a = qm1[(qm1.c1 == 'True')]

The output of this python code is

question key id c1
Women 0 omen True
machine 0 mac True

Could someone please help me out on the same as I am a beginner in Python?

user3318064
  • 59
  • 1
  • 8
  • Not sure if this helps: https://stackoverflow.com/questions/48448473/pyspark-convert-a-standard-list-to-data-frame – Larry the Llama Nov 24 '21 at 06:46
  • I tried already but it did not work – user3318064 Nov 24 '21 at 07:03
  • @user3318064 That's not the proper question to ask. Please show your input, show expected output based on that, and we will be able to create a code to do that. Pandas and PySpark do not work the same, you cannot just convert the code from one to the other as-is. – Steven Nov 24 '21 at 08:48

1 Answers1

1

here is my test test (as your question does not contain any)

df.show()
+--------+---+----+
|question|key|  id|
+--------+---+----+
|   Women|  0|omen|
| machine|  2| mac|
|     foo|  1| bar|
+--------+---+----+

and my code to create the expected output :

from pyspark.sql import functions as F

df = df.withColumn("c1", F.col("question").contains(F.col("id")))
df.show()
+--------+---+----+-----+
|question|key|  id|   c1|
+--------+---+----+-----+
|   Women|  0|omen| true|
| machine|  2| mac| true|
|     foo|  1| bar|false|
+--------+---+----+-----+

then you can simply filter on c1:

df.where("c1").show()
+--------+---+----+----+
|question|key|  id|  c1|
+--------+---+----+----+
|   Women|  0|omen|true|
| machine|  2| mac|true|
+--------+---+----+----+
Steven
  • 14,048
  • 6
  • 38
  • 73