I have this dataset i wish to train multiple ML models on in Apache Spark 2.1.1. It consists of 10 columns, 2 of which contain strings. Removing these columns is not an option as they are vital to the information I wish to gather. However, I am unable to convert the CSV file to SVM to proceed with the experiment because of this problem.
I have tried converting it to RDD which is successful then save as SVM but the file is never saved. Is there any other way around this?