Speed up pandas iterrows (xy to lat long coordinate pyproj)

Question

I have been using iterrows to transform XY coordinates to Lat, Long using the pyProj module. I know that using iterrows in pandas is slow but I am having trouble finding another way to code this.

I have a dataframe with wellnames and each wells X and Y coordinates. I also have a column with the ESPG coordinate system that can be read by pyProj. This EPSG coord system is different for many different wells. I have provided an example dataframe.

data = pd.DataFrame({"WellName": ("well1","well2","well3","well4","well5"),"EPSG": ('epsg:21898','epsg:21898','epsg:21897','epsg:21897','epsg:21897'),'X':(900011,900011,900011,900011,900011),'Y':(800011,800011,800011,800011,800011)})
data

I loop through each row of this dataframe, find the epsg coordinate system, then transform the x,y to lat, long. This works but is extremely slow. Is there a simpler more elegant solution to this that can speed it up?

import pandas as pd
import numpy as np
from pyproj import Proj, transform


for index, row in data.iterrows():
        # epsg coord system (from EPSG row)
        inProj = Proj(init=row['EPSG'])
        # espg coord system for lat long
        outProj = Proj(init='epsg:4326')
        # X and Y coords (from X and Y rows)
        x1,y1 = row['X'],row['Y']#output
        x2,y2 = transform(inProj,outProj,x1,y1)
        #print (x2,y2)
        # create and fill in lat and long columns
        data.loc[index,'latitude'] = x2
        data.loc[index,'longitude'] = y2
        #print (row['name'],row['X'],(row['EPSG']))

I had attempted this to vectorize it but I have no clue what I am doing and it crashs my python. I would not suggest using it... :/

data['latitude'],data['longitude'] = transform(Proj(init=(data['EPSG'])), Proj(init='epsg:4326'), data['X'], data['Y'])

Half Way Solution:

After more attempts I have partially solved my question. It is now orders of magnitude faster, using "apply"

It creates a new tuple column with the lat,long. I must then perform some round about solution to create two separate columns (one for lat, one for long) for the tuple.

    data['LatLong'] = data.apply(lambda row:  transform(Proj(init=row['EPSG']),Proj(init='epsg:4326'),row['X'],row['Y']), axis=1)

LatLongIndex = pd.DataFrame(data['LatLong'].values.tolist(), index=data.index)
dfDevLatLong = pd.merge(dataSDX,LatLongIndex, right_index=True, left_index=True)
dfDevLatLong

It is now workable, but still kind of slow, and I am sure there is a more elegant way to go about this.

For future users, simply using itertuples instead of iterrows will give a big speed boost. — Jacob, Nov 02 '20 at 00:03

score 0 · Answer 1 · answered Nov 10 '17 at 21:15

I have partially solved my question. It is now orders of magnitude faster, using "apply"

It creates a new tuple column with the lat,long. I must then perform some round about solution to create two separate columns (one for lat, one for long) for the tuple.

    data['LatLong'] = data.apply(lambda row:  transform(Proj(init=row['EPSG']),Proj(init='epsg:4326'),row['X'],row['Y']), axis=1)

LatLongIndex = pd.DataFrame(data['LatLong'].values.tolist(), index=data.index)
dfDevLatLong = pd.merge(dataSDX,LatLongIndex, right_index=True, left_index=True)
dfDevLatLong

It is now workable, but still kind of slow, and I am sure there is a more elegant way to go about this.

Speed up pandas iterrows (xy to lat long coordinate pyproj)

1 Answers1