-1

I am trying to use ChiSqSelector to determine the best features for a Spark 2.2 LSVCModel, thus:

import org.apache.spark.ml.feature.ChiSqSelector
val chiSelector = new ChiSqSelector().setNumTopFeatures(5).
   setFeaturesCol("features").
   setLabelCol("label").setOutputCol("selectedFeatures")

val pipeline = new Pipeline().setStages(Array(labelIndexer, monthIndexer, hashingTF
   , idf, va, featureIndexer,  chiSelector, lsvc, labelConverter))

val model = pipeline.fit(training)
val importantFeatures = model.selectedFeatures

import org.apache.spark.ml.classification.LinearSVCModel
val LSVCModel= model.stages(6).asInstanceOf[org.apache.spark.ml.classification.
   LinearSVCModel]

val importantFeatures = LSVCModel.selectedFeatures

which gives the error:

<console>:180: error: value selectedFeatures is not a member of 
org.apache.spark.ml.classification.LinearSVCModel
   val importantFeatures = LSVCModel.selectedFeatures

Is it possible to use ChiSqSelector with this model? If not, is there an alternative?

schoon
  • 2,858
  • 3
  • 46
  • 78
  • You are using wrong model. It is [`ChiSqSelectorModel.selectedFeatures`](https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.feature.ChiSqSelectorModel@selectedFeatures:Array[Int]) not `LinearSVCModel`. – Alper t. Turker Jan 07 '18 at 10:35

1 Answers1

0

The Linear SVC will not do any feature selection. You should extract the ChiSqSelectorModel from the pipeline, not the LinearSVCModel.

import org.apache.spark.ml.feature.ChiSqSelectorModel
val chiSqModel = model.stages(6).asInstanceOf[ChiSqSelectorModel]

val importantFeatures = chiSqModel.selectedFeatures
Shaido
  • 27,497
  • 23
  • 70
  • 73
  • Thanks I'll give it a try. – schoon Jan 07 '18 at 15:30
  • Works great but how do I get the actual features? `selectedFeatures` just gives me some indices. – schoon Jan 08 '18 at 07:06
  • @schoon: It will give you the indices of the features in your `features` column. Hence, to know what features were selected you should look on how you constructed the vectors in this column. – Shaido Jan 08 '18 at 07:18