0

I want to analyze distance traveled based on GPS tracks But when i calculate the distance it always comes out as too large.

I use python to make a csv file with the latitude and longitude for all points in a track which i then analyze with R. The data frame looks like this:

|      lat|      lon|   lat.p1|   lon.p1| dist_to_prev|
|--------:|--------:|--------:|--------:|------------:|
| 60.62061| 15.66640| 60.62045| 15.66660|    28.103099|
| 60.62045| 15.66660| 60.62037| 15.66662|     8.859034|
| 60.62037| 15.66662| 60.62026| 15.66636|    31.252373|
| 60.62026| 15.66636| 60.62018| 15.66636|     8.574722|
| 60.62018| 15.66636| 60.62010| 15.66650|    17.787905|
| 60.62001| 15.66672| 60.61996| 15.66684|    14.393267|
| 60.61996| 15.66684| 60.61989| 15.66685|     7.584996|
...

I could post the whole data frame here for reproducability, it's only 59 rows, but i'm not sure of the etiquette for posting big chunks of data here? Let me know how i can best share it.

lat.next and lon.next is just the lat and lon from the row below. dist_to_prev is calculated with distm() from geosphere:

library(geosphere)
library(dplyr)

df$dist_to_prev <- apply(df, 1 , FUN = function (row) { 
   distm(c(as.numeric(row["lat"]), as.numeric(row["lon"])), 
         c(as.numeric(row["lat.p1"]), as.numeric(row["lon.p1"])),
   fun = distHaversine)})

df %>% filter(dist_to_prev != "NA") %>% summarise(sum(dist_to_prev))

# A tibble: 1 x 1
`sum(dist_to_prev)`
            <dbl>
1           1266.

I took this track as an example from Trailforks and if you look at their track description it should be 787m, not 1266m as i got. This is not unique to this track but to all tracks i've looked at. When i do it they all come out 30-50% too long.

One thing that might be the cause is that there is only 5 decimal-places for the lats/lons. There is 6 decimal-places in the csv but i can only see 5 when i open it in Rstudio. I was thinking it was just formatting to make it easier to read and that the "whole" number was there but maybe not? The lat/lons are of type: double.

Why are my distances much larger than the ones displayed on the website i got the gpx-file from?

Ramon S.
  • 3
  • 2
  • For the 5-vs-6 decimal points, know that what you see on the console is not necessarily what is stored in the object. See [`?options`](https://stat.ethz.ch/R-manual/R-patched/library/base/html/options.html), specifically `"digits"`. – r2evans Nov 11 '18 at 20:27
  • @r2evans Thanks. I checked and all 6 decimal places are there. – Ramon S. Nov 11 '18 at 20:34
  • 1
    Have a look at `geosphere::distHaversine`; `distHaversine(p1 = d[ , c("lon", "lat")])`. If `p2` is missing: "_the sequential distance between the points in p1 is computed_". I.e. no need for `apply`. – Henrik Nov 11 '18 at 20:40
  • can you provide the coordinates of the track you measured to be 787m? – G. Cocca Nov 11 '18 at 20:41
  • @Henrik Thanks for noticing that. I will change my code. – Ramon S. Nov 11 '18 at 21:05

1 Answers1

1

There are couple of problems in the code above. The function distHaversine is a vectorized function thus you can avoid the loop / apply statement. This will significantly improve the performance.

Most important is with the geosphere package the first coordinate is longitude and not latitude.

df<- read.table(header =TRUE, text=" lat      lon   lat.p1   lon.p1
60.62061 15.66640 60.62045 15.66660
60.62045 15.66660 60.62037 15.66662
60.62037 15.66662 60.62026 15.66636
60.62026 15.66636 60.62018 15.66636
60.62018 15.66636 60.62010 15.66650
60.62001 15.66672 60.61996 15.66684
60.61996 15.66684 60.61989 15.66685")


library(geosphere)

#Lat is first column (incorrect)
distHaversine(df[,c("lat", "lon")], df[,c("lat.p1", "lon.p1")])
#incorrect
#[1] 28.103099  8.859034 31.252373  8.574722 17.787905 14.393267  7.584996

#Longitude is first (correct)
distHaversine(df[,c("lon", "lat")], df[,c("lon.p1", "lat.p1")])
#correct result.
#[1] 20.893456  8.972291 18.750046  8.905559 11.737448  8.598240  7.811479
Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • 1
    Nice catch on the argument order. I have been bitten by the order of *longitude* and *latitude* in various forums, packages, functions, websites, ..., so I inevitably have to revisit which is which to make sure I'm putting them in the right order. Nearer the equator this has slightly less impact, but more as you shift north/south. – r2evans Nov 11 '18 at 20:56
  • 1
    @r2evans, I have been in the same boat. I like to believe I have learned from my past, thus I have a tendency to look at my previous mistakes when I answer other questions. – Dave2e Nov 11 '18 at 21:00
  • Thanks! this was it. now i get a much more believable total distance. I will change it too distHaversine aswell. – Ramon S. Nov 11 '18 at 21:02
  • 1
    You don't need the lagged columns (see my comment above). – Henrik Nov 11 '18 at 21:02