0

I am trying to calculate the distance between two pairs of lat/long with a haversine formula. I am using a series for the last two function arguments because I am trying to calculate this for multiple coordinates that I have stored in two pandas columns. I'm getting the following error TypeError: ("'Series' object is not callable", u'occurred at index 0')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from math import radians, cos, sin, asin, sqrt

origin_lat = 51.507200
origin_lon = -0.127500

def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * np.arcsin(np.sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r

df['dist_from_org'] = df.apply(haversine(origin_lon, origin_lat, df['ulong'], df['ulat']), axis=1)

The series from the df look like this:

+----+---------+----------+
|    |  ulat   |  ulong   |
+----+---------+----------+
|  0 | 52.6333 | 1.30000  |
|  1 | 51.4667 | -0.35000 |
|  2 | 51.5084 | -0.12550 |
|  3 | 51.8833 | 0.56670  |
|  4 | 51.7667 | -1.38330 |
|  5 | 55.8667 | -2.10000 |
|  6 | 55.8667 | -2.10000 |
|  7 | 52.4667 | -1.91670 |
|  8 | 51.8833 | 0.90000  |
|  9 | 53.4083 | -2.14940 |
| 10 | 53.0167 | -1.73330 |
| 11 | 51.4667 | -0.35000 |
| 12 | 51.4667 | -0.35000 |
| 13 | 52.7167 | -1.36670 |
| 14 | 51.4667 | -0.35000 |
| 15 | 52.9667 | -1.16667 |
| 16 | 51.4667 | -0.35000 |
| 17 | 51.8833 | 0.56670  |
| 18 | 51.8833 | 0.56670  |
| 19 | 51.4833 | 0.08330  |
| 20 | 52.0833 | 0.58330  |
| 21 | 52.3000 | -0.70000 |
| 22 | 51.4000 | -0.05000 |
| 23 | 51.9333 | -2.10000 |
| 24 | 51.9000 | -0.43330 |
| 25 | 53.4809 | -2.23740 |
| 26 | 51.4853 | -3.18670 |
| 27 | 51.2000 | -1.48333 |
| 28 | 51.7779 | -3.21170 |
| 29 | 51.4667 | -0.35000 |
| 30 | 51.7167 | -0.28330 |
| 31 | 52.2000 | 0.11670  |
| 32 | 52.4167 | -1.55000 |
| 33 | 56.5000 | -2.96670 |
| 34 | 51.2167 | -1.05000 |
| 35 | 51.8964 | -2.07830 |
+----+---------+----------+

Am I not allowed to use a series in a pd.apply function? If so how can I apply a function row by row and assign the output to a new column?

metersk
  • 11,803
  • 21
  • 63
  • 100
  • check your code `df.apply(...`, shouldn't it be `pd.apply(...` ? P.S. I know nothing about pandas – Pynchia Apr 23 '15 at 16:27
  • 1
    @Pynchia Nope, I'm almost positive, it needs to be `df`, the code is not shown, but my dataframe is called `df` – metersk Apr 23 '15 at 16:33
  • What are origin_lon, origin_lat? Are the constant or do they change for each row in the table? BTW, it's a DataFrame, not a series (there are multiple columns). – Alexander Apr 23 '15 at 16:41
  • @Alexander sorry, let me update the code, yes they are constant. wouldn't calling df['ulong'] be a series though? – metersk Apr 23 '15 at 16:57
  • Could you see if this helps: http://stackoverflow.com/questions/25767596/using-haversine-formula-with-data-stored-in-a-pandas-dataframe/25767765#25767765 – EdChum Apr 23 '15 at 17:04

1 Answers1

2

You don't need to use apply when calling the function. Just use:

df['dist_from_org'] = haversine(origin_lon, origin_lat, df['ulong'], df['ulat'])

When I ran your code (using scalar values for origin_lon, origin_lat, I got TypeError: cannot convert the series to . This was caused by the assignment a = ...

I reworked the formulae to apply to series:

a = dlat.divide(2).apply(sin).pow(2) 
    + lat1.apply(cos).multiply(lat2.apply(cos).multiply(dlon.divide(2).apply(sin).pow(2)))

Let me know if this works for you.

if origin_lon and origin_lat are constants (as opposed to a series), then use this formula:

a = dlat.divide(2).apply(sin).pow(2) + cos(lat1) * lat2.apply(cos).multiply(dlon.divide(2).apply(sin).pow(2))

Because the parameters lon2 and lat2 are Pandas Series, dlon and dlat will both be Series objects as well. You then need to use apply on the series to apply the function to each element in the list.

Alexander
  • 105,104
  • 32
  • 201
  • 196
  • Still getting the error `TypeError: ("'Series' object is not callable", u'occurred at index 0')` – metersk Apr 23 '15 at 17:10
  • try >>> haversine(origin_lon, origin_lat, df['ulong'], df['ulat']) What does that return? – Alexander Apr 23 '15 at 17:11
  • Okay, I just restarted the kernel in my ipython notebook and now with the above i get this error - `AttributeError: 'numpy.float64' object has no attribute 'apply'` for the a= code that you gave above – metersk Apr 23 '15 at 17:16
  • I added a formula for a if the origin_lon and origin_lat are constants. – Alexander Apr 23 '15 at 17:18
  • It works! could you please explain why you needed to change the `a=` formula from what I had originally? – metersk Apr 23 '15 at 17:18