How to store high dimensional data to calculate dense units in subspace clustering algorithms like clique,enclus,etc. ? For example , I have 20 dimensions of a point , so if array is used , I have to allocate 20 dimensions to it, which will run out of memory. The code is to be written in 'C',so please suggest what can I use to store the high dimensional points .
Asked
Active
Viewed 170 times
-1
-
If you have an N points with 20 dimensions and N*20*sizeof(dimension) is not fitting in the memory, you need more memory. – Eugene Sh. Jul 08 '15 at 20:11
-
N*20*sizeof(dimension) will fit in the memory(it is given), but in clustering algorithms like clique ,the each dimensions are divided into 'm' equal intervals . so if there are 'k' dimensions then the grid size will be 'm^k'. But this grid of 'm^k' will contain only N points , so it is possible that most of the grid cell will be empty. – Dr.John Jul 08 '15 at 20:19
-
Would a sparse matrix solve your problem: http://stackoverflow.com/questions/22907166/represent-a-sparse-matrix-in-c-using-the-csparse-library – missimer Jul 08 '15 at 21:00
1 Answers
0
Some algorothms like CLIQUE just don't work for high-dimensional data.
You will have to rethink the algorithm not try to find a technical hack to make it somehow work.

Has QUIT--Anony-Mousse
- 76,138
- 12
- 138
- 194
-
Clique is used for high dimensional data. You read it here http://www.cs.cornell.edu/johannes/papers/1998/sigmod1998-clique.pdf – Dr.John Jul 09 '15 at 06:43
-
No, it is not. It's running time is exponential in the dimensionality, which means it will only work for low to medium dimensionalities. It is okay if you want to find 5 dimensional clusters in a 20 dimensional data set, but it will die on finding something in 1000+ dimensions. – Has QUIT--Anony-Mousse Jul 09 '15 at 07:44
-
In the question I have mentioned '20' dimensions as example & not 1000. But 20 dimension is also too much for array to store because if I divide each dimension into 10 parts then array will require '10^20' which will give the error of out-of-memory. Can u suggest some method/data structure which can be used in place of array. – Dr.John Jul 09 '15 at 08:05
-
Only store the non-empty cells. You probably don't have 10^20 points. – Has QUIT--Anony-Mousse Jul 09 '15 at 09:07