I have a really large file, around 10GB. I can't load it to the memory, so I managed to transfer it to .mat file. But 'out of memory' problem still comes up when I tried clustering. The ultimate solution to it I think is put those memory thing to the disk. But I need to call kmeans() method from matlab. Is there a way to put the local variables in the kmeans to the disk as well without rewriting the method?
-
If you're computing data with that size, you should probably use a computing center... If your FILE is that large but your raw DATA is not, then you should reconsider your format. – scenia Feb 14 '14 at 16:04
3 Answers
When you load your data, it is loaded first to the RAM memory of your computer, so I think the only ultimate solution to your problem is to have like 16GB of RAM.

- 4,038
- 9
- 59
- 102
-
What I want is put large local variables on the disk. I can't find so large RAM now. – Tengerye Feb 15 '14 at 17:16
You need a strategy to deal with large data sets. Possibilities are:
- Use a system with enough memory
- Reduce precision of your data set. For clustering small errors and scaling are not important, change attributes to scaled uint8 or uint16 if possible. (And obviously, delete all irrelevant data)
- Use more appropriate algorithms. I'm not an expert in this field, but CLARA and CLARANS are two alternatives. These algorithms don't require only a subset of the data, should be possible to combine with matfile to keep only the relevant parts in memory.

- 36,610
- 3
- 36
- 69
-
-
A very fuzzy answer to three totally different strategies, what are the problems you are expecting? I don't know your scenario, I might be wrong. – Daniel Feb 15 '14 at 17:40
-
Finally, I turn to a server with Octave. Money is an ultimate solution. :D – Tengerye Mar 26 '18 at 00:34
Probably you can try downsample your data if it is not highly nonlinear. If you are interested you can read reference http://www.mathworks.com/help/signal/ref/downsample.html
For example you can Take your data, downsample by scale = 4 and then you will have 2.5GB of data. You can go further but it will increase the error. After your processing you can upsample your data using different technics(Matlab has all built-in). Unfortunately I don't know type of your data, so if my answer is not matching your question, sorry.

- 172
- 10