Spark ML- prediction in KMeans

Question

I have created a KMeans model using Spark ML methods.

val kmeans = new KMeans()
val model = kmeans.fit(df)

I got my model ready. But how to predict that in which cluster new data points will fall. In MLlib, model.predict(Vector) predict the cluster for the new data points. I saw the transform method on the model but its not working.

can you elaborate what is "not working" ? – mtoto Jan 03 '18 at 09:35 — mtoto, Jan 03 '18 at 09:35

Ishan Kumar · Answer 1 · 2018-01-03T12:29:57.000

Thanks Jacek Laskowski for clarifying Oli. Its working fine for me now. It was a simple mistake. Below is the whole code.

val conf = new SparkConf().setMaster("local").setAppName("ml Kmeans")
val spark = SparkSession.builder().config(conf).getOrCreate()
import spark.implicits._
val trainingData = spark.read.json(spark.sparkContext.wholeTextFiles("file:/home/iot/data/traingJson.json").values)
val parsedData = trainingData.select("value.humidity", "value.speed", "value.temperature", "value.vibration")
val assembler = new VectorAssembler().setInputCols(Array("humidity", "speed", "temperature", "vibration")).setOutputCol("features")
val df = assembler.transform(parsedData)
val kmeans = new KMeans()
val model = kmeans.fit(df)
model.write.save("file:/home/iot/data/model1")
//--------------------------------Testing the Model------------------------
val uploadModel=KMeansModel.load("file:/home/iot/data/model1")
val testData = spark.read.json(spark.sparkContext.wholeTextFiles("file:/home/iot/data/testJson.json").values).select("value.humidity", "value.speed", "value.temperature", "value.vibration")
 val assembler=new VectorAssembler().setInputCols(Array("humidity","speed","temperature","vibration")).setOutputCol("features")
 val df = assembler.transform(testData)
model.transform(df).show(false)

Spark ML- prediction in KMeans

1 Answers1