1

I have a classification task. I want to use apache spark ml lib SVM algorithm for classification. I have input data which is n-dimensional. In the feature vectors some of dimensions may be missing.

How to approach with missing values? I think it would me wrong to assume missing values as zero or something else.

zero323
  • 322,348
  • 103
  • 959
  • 935
hard coder
  • 5,449
  • 6
  • 36
  • 61

2 Answers2

0

Right. ML Lib does not impute missing values, and filling in 0 will skew your results. However, there is a ReplaceMissingValues package at WEKA that may be of use to you; this implements one of the imputation algorithms. http://weka.sourceforge.net/doc.stable/weka/classifiers/functions/LibSVM.html

Prune
  • 76,765
  • 14
  • 60
  • 81
0

You have two options: 1- omit vectors with missing values 2- Just impute missing values, you can use mean or mode values

I would suggest to do it in spark, is very simple code, here there is an example:

example

Community
  • 1
  • 1
Dr VComas
  • 735
  • 7
  • 22