2

My usecase goes like this:

  1. Read one or more dataframes in a spark-scala app and register them as tables.
  2. Get a python callable which would run pyspark based transformations on these dataframes.
  3. Register the transformed dataframes as tables into the spark session from the pyspark callable.
  4. Read these transformed dataframes from the scala-spark app and do optional post processing on them.

Can someone help achieve this kind of seamless scala-pyspark integration? The challenge is to be able to run python-based transformations on dataframes from inside the scala-spark app.

A working example would be very much appreciated.

Best Regards

Ankit Khettry
  • 997
  • 1
  • 13
  • 33

0 Answers0