Pandas index operations in Pyspark

Asked Jul 06 '22 at 17:40

Active Jul 06 '22 at 17:40

Viewed 46 times

In our Data Science project we are playing around pandas dataframe, numpy and scipy libraries and we want to change the code into Pyspark, We are facing issues like:

wst = cur_buck[:, [0]]

cur_buck[:, :-1] = cur_buck[:, 1:] - wst

cur_buck[:, -1] = cur_buck[:, -2]

number_of_particles=100

particles_matrix[10] = cur_buck[:, -1].reshape(number_of_particles).copy()

Same thing how can we achieve in Pyspark?

Any leads would be highly appreciable like these 4 lines I want to convert in pyspark.

Thanks in advance

asked Jul 06 '22 at 17:40

Shubham

pyspark is distributed processing framework, and distributed processing can't be index based as multiple workers have pieces of the data and are executing the process on their individual pieces simultaneously. – samkart Jul 06 '22 at 17:59
thats correct, but is there any way to achieve same thing into Pyspark, not exactly with index but something similar to this – Shubham Jul 06 '22 at 18:11
the first line doesn't work on my sample data and throws an error. could you please check the code and share a sample of the input data if the code is okay? – samkart Jul 07 '22 at 05:57

Pandas index operations in Pyspark

0 Answers0