I have many large 1GB+ matrices of doubles
(floats), many of them 0.0
, that need to be stored efficiently. I indend on keeping the double
type since some of the elements do require to be a double
(but I can consider changing this if it could lead to a significant space saving). A string header is optional. The matrices have no missing elements, NaNs, NAs, nulls, etc: they are all doubles
.
Some columns will be sparse, others will not be. The proportion of columns that are sparse will vary from file to file.
What is a space efficient alternative to CSV? For my use, I need to parse this matrix quickly into R
, python
and Java
, so a file format specific to a single language is not appropriate. Access may need to be by row or column.
I am also not looking for a commercial solution.
My main objective is to save HDD space without blowing out io
times. RAM usage once imported is not the primary consideration.