I discovered Koalas from Spark+AI Summit which brings pandas to Spark.
As far as I know if I need to map
a third party function to a Spark DataFrame
, I have to install the package on every node of my Spark cluster.
Is this the same for Koalas? Or I just need to run pip install koalas
on my master node and let Koalas and Spark to take care of the rest?
I haven't found any details in Koalas' Docs besides the pip install koalas
.