I have a PYSPARK dataframe df with values 'latitude' and 'longitude':
+---------+---------+
| latitude|longitude|
+---------+---------+
|51.822872| 4.905615|
|51.819645| 4.961687|
| 51.81964| 4.961713|
| 51.82256| 4.911187|
|51.819263| 4.904488|
+---------+---------+
I want to get the UTM coordinates ('x' and 'y') from the dataframe columns. To do this, I need to feed the values 'longitude' and 'latitude' to the following function from pyproj. The result 'x' and 'y' should then be append to the original dataframe df. This is how I did it in Pandas:
from pyproj import Proj
pp = Proj(proj='utm',zone=31,ellps='WGS84', preserve_units=False)
xx, yy = pp(df["longitude"].values, df["latitude"].values)
df["X"] = xx
df["Y"] = yy
How would I do this in Pyspark?