We are converting an online machine learning Linear Regression model from Vowpal Wabbit to Spark MLLib. Vowpal Wabbit allows for arbitrary, sparse features by training the model on weights backed by a linked list, whereas Spark MLLib trains on an MLLib Vector
of weights which is backed by a fixed length array.
The features we pass to the model are arbitrary strings and not categories. Vowpal Wabbit maps these features to weight values of 1.0
using a hash. We can do the same mapping in MLLib, but are limited to a fixed length array. Is it possible to train such a model in MLLib where the size of the feature space is not known?