I have a CSV of both textual and numerical data. I need to convert it to feature vector data in Spark (Double values). Is there any way to do that ?
I see some e.g where each keyword is mapped to some double value and use this to convert. However if there are multiple keywords, it is difficult to do this way.
Is there any other way out? I see Spark provides Extractors which will convert into feature vectors. Could someone please give an example?
48, Private, 105808, 9th, 5, Widowed, Transport-moving, Unmarried, White, Male, 0, 0, 40, United-States, >50K
42, Private, 169995, Some-college, 10, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 45, United-States, <=50K