How will method in spark threat a vector assembler column? For example, if I have longitude and latitude column, is it better to assemble them using vector assembler then put it into my model or it does not make any difference if I just put them directly(separately)?
Example1:
loc_assembler = VectorAssembler(inputCols=['long', 'lat'], outputCol='loc')
vector_assembler = VectorAssembler(inputCols=['loc', 'feature1', 'feature2'], outputCol='features')
lr = LinearRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
pipeline = Pipeline(stages=[loc_assembler, vector_assembler, lr])
Example2:
vector_assembler = VectorAssembler(inputCols=['long', 'lat', 'feature1', 'feature2'], outputCol='features')
lr = LinearRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
pipeline = Pipeline(stages=[vector_assembler, lr])
What is the difference? Which one is better?