13

I have a data frame like the below and would like to convert the Latitude and Longitude columns in Degree, Minute, Second format into decimal degrees - with negative for the correct hemisphere. Is there an easy way to do that?

Parent Company  CPO PKO Latitude    Longitude
Incasi Raya X       0°51'56.29"S    101°26'46.29"E
Incasi Raya X       1°23'39.29"S    101°35'30.45"E
Incasi Raya X       0°19'56.63"N    99°22'56.36"E
Incasi Raya X       0°21'45.91"N    99°37'59.68"E
Incasi Raya X       1°41'6.56"S 102°14'7.68"E
Incasi Raya X       1°15'2.13"S 101°34'30.38"E
Incasi Raya X       2°19'44.26"S    100°59'34.55"E
Musim Mas   X       1°44'55.94"N    101°22'15.94"E

For example 0°51'56.29"S would be converted to -0.8656361

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
Funkeh-Monkeh
  • 649
  • 6
  • 17
  • related: https://stackoverflow.com/questions/10852955/python-batch-convert-gps-positions-to-lat-lon-decimals and https://stackoverflow.com/questions/33997361/how-to-convert-degree-minute-second-to-degree-decimal-in-python – Stephen Rauch May 05 '18 at 19:24

4 Answers4

12

Basing my answer on a function from SO you can do it like this:

Interestingly this answer is also 2x as fast as MaxU and Amis answer for a dataset with +500 rows. My bet is that the bottleneck is str.extract(). But something is clearly strange.

import pandas as pd
import re

#https://stackoverflow.com/questions/33997361
def dms2dd(s):
    # example: s = """0°51'56.29"S"""
    degrees, minutes, seconds, direction = re.split('[°\'"]+', s)
    dd = float(degrees) + float(minutes)/60 + float(seconds)/(60*60);
    if direction in ('S','W'):
        dd*= -1
    return dd

df = pd.DataFrame({'CPO': {0: 'Raya', 1: 'Raya'},
 'Latitude': {0: '0°51\'56.29"S', 1: '1°23\'39.29"S'},
 'Longitude': {0: '101°26\'46.29"E', 1: '101°35\'30.45"E'},
 'PKO': {0: 'X', 1: 'X'},
 'ParentCompany': {0: 'Incasi', 1: 'Incasi'}})

df['Latitude'] = df['Latitude'].apply(dms2dd)
df['Longitude'] = df['Longitude'].apply(dms2dd)

printing df returns:

    CPO   Latitude   Longitude PKO ParentCompany
0  Raya  -0.865636  101.446192   X        Incasi
1  Raya  -1.394247  101.591792   X        Incasi

Update: To correct your mistake you could do something in the lines of:

m = df['Latitude'].str[-2] != '"'
df.loc[m, 'Latitude'] = df.loc[m, 'Latitude'].str[:-1] + '"' + df.loc[m, 'Latitude'].str[-1]

Full example:

import re

s1 = """0°51'56.29"S"""
s2 = """0°51'56.29S"""

df = pd.Series((s1,s2)).to_frame(name='Latitude')

m = df['Latitude'].str[-2] != '"'
df.loc[m, 'Latitude'] = df.loc[m, 'Latitude'].str[:-1] + '"' + df.loc[m, 'Latitude'].str[-1]

print(df)
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
  • Thanks for this, @Anton vBR. It worked great on the `Longitude` column but for some weird reason I am getting this - `ValueError: not enough values to unpack (expected 4, got 3)` when I run it on my `Latitude` column. Any idea why that could be happening? – Funkeh-Monkeh May 05 '18 at 21:12
  • Ok, just realised one row had a value like this `0°45'2.22S` instead of `0°45'2.22"S` so I guess that must have caused it. Is there any easy way to get around this? – Funkeh-Monkeh May 05 '18 at 21:20
  • @Funkeh-Monkeh Is it only one row? I'd manually go in and change it to the correct value in that case. In other case, if you are sure everything else is correct I added an updated code you could run before you apply the functions. – Anton vBR May 06 '18 at 07:12
4

You can use vectorized operations using pd.Series.str.extract. For the latitude, for example:

parts = df.Latitude.str.extract('(\d+)°(\d+)\'([^"]+)"([N|S|E|W])', expand=True)
>>> (parts[0].astype(int) + parts[1].astype(float) / 60 + parts[2].astype(float) / 3600) * parts[3].map({'N':1, 'S':-1, 'E': 1, 'W':-1})
0    101.446192
1    101.591792
2     99.382322
3     99.633244
4    102.235467
5    101.575106
6    100.992931
7    101.371094
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
1

Here is a vectorized approach, that also uses matrix * vector ([1, 1./60, 1./3600]) multiplication:

In [233]: %paste
def dms2dec(s):
    x = (s.str.upper()
          .str.split(r'[°\'"]', expand=True)
          .replace(['S','W','N','E'], [-1,-1,1,1])
          .astype('float'))
    return x.iloc[:, :3].dot([1, 1./60, 1./3600]).mul(x.iloc[:, 3])

## -- End pasted text --

In [234]: df[['Latitude','Longitude']] = df[['Latitude','Longitude']].apply(dms2dec)

In [235]: df
Out[235]:
  Parent Company CPO PKO  Latitude   Longitude
0    Incasi Raya       X -0.865636  101.446192
1    Incasi Raya       X -1.394247  101.591792
2    Incasi Raya       X  0.332397   99.382322
3    Incasi Raya       X  0.362753   99.633244
4    Incasi Raya       X -1.685156  102.235467
5    Incasi Raya       X -1.250592  101.575106
6    Incasi Raya       X -2.328961  100.992931
7      Musim Mas       X  1.748872  101.371094

step by step explanation:

In [239]: x = (s.str.upper()
     ...:       .str.split(r'[°\'"]', expand=True)
     ...:       .replace(['S','W','N','E'], [-1,-1,1,1])
     ...:       .astype('float'))

In [240]: x
Out[240]:
     0     1      2    3
0  0.0  51.0  56.29 -1.0
1  1.0  23.0  39.29 -1.0
2  0.0  19.0  56.63  1.0
3  0.0  21.0  45.91  1.0
4  1.0  41.0   6.56 -1.0
5  1.0  15.0   2.13 -1.0
6  2.0  19.0  44.26 -1.0
7  1.0  44.0  55.94  1.0

In [241]: x.iloc[:, :3].dot([1, 1./60, 1./3600])
Out[241]:
0    0.865636
1    1.394247
2    0.332397
3    0.362753
4    1.685156
5    1.250592
6    2.328961
7    1.748872
dtype: float64

In [242]: x.iloc[:, :3].dot([1, 1./60, 1./3600]).mul(x.iloc[:, 3])
Out[242]:
0   -0.865636
1   -1.394247
2    0.332397
3    0.362753
4   -1.685156
5   -1.250592
6   -2.328961
7    1.748872
dtype: float64
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
0

You can use the function clean_lat_long() from the library DataPrep. Install it with pip install dataprep.

from dataprep.clean import clean_lat_long
df = pd.DataFrame({"Latitude": ["0°51'56.29''S", "1°23'39.29''S", "0°19'56.63''N"],
     "Longitude": ["101°26'46.29''E", "101°35'30.45''E", "99°22'56.36''E"]})

df2 = clean_lat_long(df, lat_col="Latitude", long_col="Longitude", split=True)
df2
        Latitude        Longitude  Latitude_clean  Longitude_clean
0  0°51'56.29''S  101°26'46.29''E         -0.8656         101.4462
1  1°23'39.29''S  101°35'30.45''E         -1.3942         101.5918
2  0°19'56.63''N   99°22'56.36''E          0.3324          99.3823
victoria55
  • 225
  • 2
  • 6