-1

I have this dataset i wish to train multiple ML models on in Apache Spark 2.1.1. It consists of 10 columns, 2 of which contain strings. Removing these columns is not an option as they are vital to the information I wish to gather. However, I am unable to convert the CSV file to SVM to proceed with the experiment because of this problem.

I have tried converting it to RDD which is successful then save as SVM but the file is never saved. Is there any other way around this?

srikavineehari
  • 2,502
  • 1
  • 11
  • 21

1 Answers1

0

You can create an two arrays of your strings in the concerned two columns, and use the indices as features (instead of using the string values) to train your model.

Walid Da.
  • 948
  • 1
  • 7
  • 15