We have a requirement if making n*n matrix in pyspark for some calculation.Using pyspark its possible we tried to do that like below:
similarity_matrix = np.zeros(shape=(data1.count(),data1.count()))
similarity_matrix = spark.createDataFrame(similarity_matrix)
Here data is our dataframe of 80K length.Is there any way to do this in pyspark as we are getting memory error while doing this