0

In our Data Science project we are playing around pandas dataframe, numpy and scipy libraries and we want to change the code into Pyspark, We are facing issues like:

wst = cur_buck[:, [0]]

cur_buck[:, :-1] = cur_buck[:, 1:] - wst

cur_buck[:, -1] = cur_buck[:, -2]

number_of_particles=100

particles_matrix[10] = cur_buck[:, -1].reshape(number_of_particles).copy()

Same thing how can we achieve in Pyspark?

Any leads would be highly appreciable like these 4 lines I want to convert in pyspark.

Thanks in advance

  • pyspark is distributed processing framework, and distributed processing can't be index based as multiple workers have pieces of the data and are executing the process on their individual pieces simultaneously. – samkart Jul 06 '22 at 17:59
  • thats correct, but is there any way to achieve same thing into Pyspark, not exactly with index but something similar to this – Shubham Jul 06 '22 at 18:11
  • the first line doesn't work on my sample data and throws an error. could you please check the code and share a sample of the input data if the code is okay? – samkart Jul 07 '22 at 05:57

0 Answers0