Run a pyspark script as part of a Spark-scala application

Asked Sep 27 '19 at 12:21

Active Jul 30 '23 at 10:37

Viewed 192 times

My usecase goes like this:

Read one or more dataframes in a spark-scala app and register them as tables.
Get a python callable which would run pyspark based transformations on these dataframes.
Register the transformed dataframes as tables into the spark session from the pyspark callable.
Read these transformed dataframes from the scala-spark app and do optional post processing on them.

Can someone help achieve this kind of seamless scala-pyspark integration? The challenge is to be able to run python-based transformations on dataframes from inside the scala-spark app.

A working example would be very much appreciated.

Best Regards

edited Nov 19 '19 at 06:19

asked Sep 27 '19 at 12:21

Ankit Khettry

It can be done via Py4J https://www.py4j.org/ – Vitaly Olegovitch Sep 27 '19 at 13:11
I checked py4j but it doesn't seem to fit my use case. Can u share an example or post where the same is being done? – Ankit Khettry Sep 27 '19 at 18:11

Run a pyspark script as part of a Spark-scala application

0 Answers0