I am interested in python mining
data sets too big to sit in RAM but sitting within a single HD.
I understand that I can export the data as hdf5
files, using pytables
. Also the numexpr
allows for some basic out-of-core computation.
What would come next? Mini-batching when possible, and relying on linear algebra results to decompose the computation when mini-batching cannot be used?
Or are there some higher level tools I have missed?
Thanks for insights,