0

Is there a way to correctly represent missing values in VW input format -- not to impute with the mean or median, not to set them to 0 or any other constant, but to treat them as really missing, so that SGD and FTRL-Proximal algorithms could exclude these coordinates from the gradient computation for a given example?

kurtosis
  • 1,365
  • 2
  • 12
  • 27

1 Answers1

1

VW expects sparse feature representation input format, see VW wiki. So missing values are treated correctly. Simply, don't list the features whose values are missing.

Martin Popel
  • 2,671
  • 12
  • 22
  • As far as I know VW will consider unreported features as carrying a value of 0. I'm not sure why this is the right approach, when the unreported features carry numeric semantics in the domain. I'd actually think imputing with the median/mean/etc would make sense in many cases, but happy to learn if I'm wrong here. – matanster Aug 13 '18 at 20:28