In order to apply PCA from pyspark.ml.feature
, I need to convert a org.apache.spark.sql.types.ArrayType:array<float>
to org.apache.spark.ml.linalg.VectorUDT
Say I have the following dataframe :
df = spark.createDataFrame([
('string1',[5.0,4.0,0.5]),
('string2',[2.0,0.76,7.54]),
], schema='a string, b array<float>')
Whereas a = Vectors.dense(df.select('b').head(1)[0][0])
seems to work for one row, I was wondering how I can apply this function for all the rows.