I'm trying to train the MLlib RandomForestRegression Model using the RandomForest.trainRegressor API.
After training, when I try to save the model the resulting model folder has a size of 6.5MB on disk, but there are 1120 small parquet files in the data folder that seem to be unnecessary and slow to upload/download to s3.
Is this the expected behavior? I'm certainly repartitioning the labeledPoints to have 1 partition but this is happening regardless.