I am trying to convert python code to pyspark but getting error as 'Row' object does not support item assignment

Question

I am trying below python code to convert in pyspark. Please let me know what's wrong in pyspark version of code:-

Original python version:-

for i in range(0,km_data.count()):

  if i==0:

     km_data['risk'].iloc[i]=not_lapsed+lapsed
      

  else:

    km_data['risk'].iloc[i]=km_data['risk'].iloc[i-1]-(km_data['lapsed'].iloc[i-1])-(km_data['censored'].iloc[i])

Pyspark version used:-

for i in range(0,km_data.count()):

  if i==0:

    km_data.collect()[i]['risk']=not_lapsed+lapsed

  else:

    km_data.collect()[i]['risk']=km_data.collect()[i-1]['risk']-(km_data.collect()[i-1]['lapsed'])-(km_data.collect()[i-1]['cencosred'])

Basically I am looking for equivalent of iloc in pyspark which can help me getting the results. Please ignore indentation issues as I have typed this code using mobile.

You want to share the sample input and output in Table format for a better visibility ? — dsk, Jul 08 '20 at 05:39
@dsk, actually using collect() I am getting error as row object does not support assignment. — Shashank Paliwal, Jul 08 '20 at 05:47
That is because you are assigning a value here km_data.collect()[i]['risk']= ... try using when() and otherwise() combination , which is nothing but if else in Python — dsk, Jul 08 '20 at 05:54
@dsk could you please show me an example of using when and otherwise with loop? — Shashank Paliwal, Jul 08 '20 at 05:58
Please follow this - https://stackoverflow.com/questions/39982135/apache-spark-dealing-with-case-statements — dsk, Jul 08 '20 at 06:00

score 0 · Answer 1 · answered Jul 08 '20 at 05:38

0

Collect() can be used here, and I can see you did the same-

X = df.collect()[0]['age']  
or 
X = df.collect()[0][1]  #row 0 col 1

is there anything other than you are looking for ?

answered Jul 08 '20 at 05:38

dsk

1,863
2
10
13

I am trying to convert python code to pyspark but getting error as 'Row' object does not support item assignment

1 Answers1