1

Lately, I've been learning about spark sql, and I wanna know, is there any possible way to use mllib in spark sql, like :

select mllib_methodname(some column) from tablename; 

here, the "mllib_methodname" method is a mllib method. Is there some example shows how to use mllib methods in spark sql?

Thanks in advance.

zero323
  • 322,348
  • 103
  • 959
  • 935
ldl
  • 156
  • 3
  • 12
  • 2
    Currently I don't think , SQL is mainly meant for dataware housing and pre processing the data , you can surely build the dataset using SQL and then run in MLlib , but I couldn't find the other way around – Abhishek Choudhary Jun 25 '15 at 14:50
  • I think I can customize the function in the sql to call method in MLlib – ldl Jun 26 '15 at 10:42
  • That will be great and you may check spark buglist , if its not there you may contribute – Abhishek Choudhary Jun 26 '15 at 11:08

1 Answers1

1

The new pipeline API is based on DataFrames, which is backed by SQL. See

http://spark.apache.org/docs/latest/ml-guide.html

Or you can simply register the predict method from MLlib models as UDFs and use them in your SQL statement. See

http://spark.apache.org/docs/latest/sql-programming-guide.html#udf-registration-moved-to-sqlcontextudf-java--scala