0

I would like to make a similarity matrix of a big data for spectral clustering. To do so I am going to use ml_corr in sparklyr.

The problem is that ml_corr does the correlation on the pairwise columns, while I want to do it on rows. my option is to transpose my sparklyr table data but I couldn't find any function to do it and would appreciate your help if you know how to do it.

zero323
  • 322,348
  • 103
  • 959
  • 935
Naghmeh
  • 1
  • 2
  • What are the dimensions. You have to remember that wide data just won't fly on Spark. Not to mention the result is just a plain local object. So if you have data, that is good fit for Spark (long and narrow) then the result of transposition is no longer, and the result is unlikely fit in memory anyway. – zero323 Sep 14 '18 at 10:18
  • Yes exactly I can pass it as long and narrow (the size is around 300k X 50). But the problem is that I can not make the transposed matrix in sparklyr. – Naghmeh Sep 14 '18 at 12:33
  • If you were using R, 'combn' applied to a proper row identifier should deliver a set of 2 way set of grouping sets. Tested solutions require a sample dataset. – IRTFM Sep 25 '18 at 09:00

0 Answers0