0

I'm currently working on project where we will be using pySpark and pyproj for gps to cartesian transormations.

In this project I will be getting parquet files as an input, will need to modify content of one column (column that will contain gps coordinates), and save the output to parquet. We want to use PySpark UDF function, and while trying to pass it to this UDF function I received:

Cannot pass transformer to UDF: TypeError: Invalid argument, not a string or column

I was thinking about create an class and this UDF function will be an class method, and pyproj.Transformer will be an class attribute. PySpark will create one instance class that will be used across all other workers, so to use pyproj.Transformer with PySpark I need to know if pyproj.Transformer is statless or stateful. I guess that it is not since it is doing some geometric transformations. I was trying to find some info about it in pyproj documentation but unfortunately I didn't succed

Does somebody know if pyproj.Transformer is statless or stateful?

0 Answers0