My usecase goes like this:
- Read one or more dataframes in a spark-scala app and register them as tables.
- Get a python callable which would run pyspark based transformations on these dataframes.
- Register the transformed dataframes as tables into the spark session from the pyspark callable.
- Read these transformed dataframes from the scala-spark app and do optional post processing on them.
Can someone help achieve this kind of seamless scala-pyspark integration? The challenge is to be able to run python-based transformations on dataframes from inside the scala-spark app.
A working example would be very much appreciated.
Best Regards