1

I discovered Koalas from Spark+AI Summit which brings pandas to Spark.

As far as I know if I need to map a third party function to a Spark DataFrame, I have to install the package on every node of my Spark cluster.

Is this the same for Koalas? Or I just need to run pip install koalas on my master node and let Koalas and Spark to take care of the rest?

I haven't found any details in Koalas' Docs besides the pip install koalas.

Yuan JI
  • 2,927
  • 2
  • 20
  • 29
  • I was also looking for something along this line. My impression was that I can install koalas on my local machine and tell it where the cluster is. It didn't occur to me that it has to be installed on the cluster at all... – Dror Oct 29 '19 at 15:58

1 Answers1

-1

Yes you would need to install koalas on all the nodes of the cluster. Typically all 3rd party libraries need to be installed on all the nodes of the cluster.

Jayadeep Jayaraman
  • 2,747
  • 3
  • 15
  • 26