I am trying below python code to convert in pyspark. Please let me know what's wrong in pyspark version of code:-
Original python version:-
for i in range(0,km_data.count()):
if i==0:
km_data['risk'].iloc[i]=not_lapsed+lapsed
else:
km_data['risk'].iloc[i]=km_data['risk'].iloc[i-1]-(km_data['lapsed'].iloc[i-1])-(km_data['censored'].iloc[i])
Pyspark version used:-
for i in range(0,km_data.count()):
if i==0:
km_data.collect()[i]['risk']=not_lapsed+lapsed
else:
km_data.collect()[i]['risk']=km_data.collect()[i-1]['risk']-(km_data.collect()[i-1]['lapsed'])-(km_data.collect()[i-1]['cencosred'])
Basically I am looking for equivalent of iloc in pyspark which can help me getting the results. Please ignore indentation issues as I have typed this code using mobile.