2

This is undoubtedly a bit of a "can't see the wood for the trees" moment. I've been staring at this code for an hour and can't see what I've done wrong. I know it's staring me in the face but I just can't see it!

I'm trying to convert between two geographical co-ordinate systems using Python.

I have longitude (x-axis) and latitude (y-axis) values and want to convert to OSGB 1936. For a single point, I can do the following:

import numpy as np
import pandas as pd
import shapefile
import pyproj

inProj = pyproj.Proj(init='epsg:4326')
outProj = pyproj.Proj(init='epsg:27700')

x1,y1 = (-2.772048, 53.364265)

x2,y2 = pyproj.transform(inProj,outProj,x1,y1)

print(x1,y1)
print(x2,y2)

This produces the following:

-2.772048 53.364265
348721.01039783185 385543.95241055806

Which seems reasonable and suggests that longitude of -2.772048 is converted to a co-ordinate of 348721.0103978.

In fact, I want to do this in a Pandas dataframe. The dataframe contains columns containing longitude and latitude and I want to add two additional columns that contain the converted co-ordinates (called newLong and newLat).

An exemplar dataframe might be:

    latitude  longitude
0  53.364265  -2.772048
1  53.632481  -2.816242
2  53.644596  -2.970592

And the code I've written is:

import numpy as np
import pandas as pd
import shapefile
import pyproj

inProj = pyproj.Proj(init='epsg:4326')
outProj = pyproj.Proj(init='epsg:27700')

df = pd.DataFrame({'longitude':[-2.772048,-2.816242,-2.970592],'latitude':[53.364265,53.632481,53.644596]})

def convertCoords(row):
    x2,y2 = pyproj.transform(inProj,outProj,row['longitude'],row['latitude'])
    return pd.Series({'newLong':x2,'newLat':y2})

df[['newLong','newLat']] = df.apply(convertCoords,axis=1)

print(df)

Which produces:

    latitude  longitude        newLong         newLat
0  53.364265  -2.772048  385543.952411  348721.010398
1  53.632481  -2.816242  415416.003113  346121.990302
2  53.644596  -2.970592  416892.024217  335933.971216

But now it seems that the newLong and newLat values have been mixed up (compared with the results of the single point conversion shown above).

Where have I got my wires crossed to produce this result? (I apologise if it's completely obvious!)

user1718097
  • 4,090
  • 11
  • 48
  • 63

2 Answers2

4

When you do df[['newLong','newLat']] = df.apply(convertCoords,axis=1), you are indexing the columns of the df.apply output. However, the column order is arbitrary because your series was defined using a dictionary (which is inherently unordered).

You can opt to return a Series with a fixed column ordering:

return pd.Series([x2, y2])

Alternatively, if you want to keep the convertCoords output labelled, then you can use .join to combine results instead:

return pd.Series({'newLong':x2,'newLat':y2})
...
df = df.join(df.apply(convertCoords, axis=1))
nneonneo
  • 171,345
  • 36
  • 312
  • 383
3

Please note that the transform function of pyproj accepts also arrays, which is quite useful when it comes to large dataframes, and much faster than using lambda/apply function

import pandas as pd
from pyproj import Proj, transform

inProj, outProj = Proj(init='epsg:4326'), Proj(init='epsg:27700')
df['newLon'], df['newLat'] = transform(inProj, outProj, df['longitude'].tolist(), df['longitude'].tolist())
J. Doe
  • 3,458
  • 2
  • 24
  • 42
  • As stated in the [docs](http://pyproj4.github.io/pyproj/stable/advanced_examples.html#repeated-transformations), using Transformer.transform is more performant. – sanzoghenzo Sep 11 '20 at 09:42