I have a large file that I'm not able to load so I'm using a local file with xgb.DMatrix
. But I'd like to use only a subset of the features. The documentation on xgboost
says that the colset
argument on slice
is "currently not used" and there is no metion of this feature in the github page. And I haven't found any other clue of how to do column subsetting with external memory.
I wish to compare models generated with different features subsettings. The only thing I could think of is to create a new file with the features that I want to use but it's taking a long time and will take a lot of memory... I can't help wondering if there is a better way.
ps.: I tried using h2o
package too but h2o.importFile
froze.