I have a big pyspark data frame with the columns as some products and the rows as its prices over time. I need to calculate the covariance matrix of all the products, but the data is too big to convert to a pandas data frame, so I need to do it with pyspark. I've searched it everywhere but I couldn't figure out a solution to this problem. Does anyone have an idea to how it could be done?
I already have the correlation matrix, so any method using the diagonal matrix of standard deviations is also very welcome.
Here is an example of two columns of my dataframe.