I am trying to import a frame by creating a h2o frame from a spark parquet file. The File is 2GB has about 12M rows and Sparse Vectors with 12k cols. It is not that big in parquet format but the import takes forever. In h2o it is actually reported as 447mb compressed size. Quite small actually.
Am I doing it wrong and when I actually finish importing (took 39min), Is there any form in h2o to save the frame to disk for a fast loading next time??
I understand h2o does some magic behind the scene which takes so long but I only found a download csv option which is slow and huge for a 11k x 1M sparse data and I doubt it is any faster to import.
I feel like there is a part missing. Any Info about h2o data import/export is appreciated. Model save/load works great but train/val/test data loading seems an unreasonably slow procedure.
I got 10 sparkworkers with 10g each and gave the driver 8g. This should be plenty.