1

I have a table with some vessel GPS data. Just like

ID        POSTIME        LON      LAT     SPEED  AZIMUTH
1  2015-12-31 23:56:15 123.4003 32.39449   5.2     145
2  2015-12-31 23:56:53 123.3982 32.39487   5.2     138
3  2015-12-31 23:59:53 123.3884 32.39625   5.3     138
4  2016-01-01 00:01:19 123.3836 32.39702   5.2     146
5  2016-01-01 00:02:58 123.3788 32.39915   5.1     154
6  2016-01-01 00:06:41 123.3708 32.40391   5.1     157

And I want to calculate the distance, time difference and angle difference of the ship at each sample point. I have written a function point.distance for calculating distance by lon and lat of different points, just like

point.distance <- function(lon1,lat1,lon2,lat2)

lon1/2 and lat1/2 stands for different points

also with a point.angle function to calculate angle difference

point.angle <- function(lon1,lat1,lon2,lat2,lon3,lat3)

I know how to use functions on 2 individual points, but how to apply the functions to all the rows and add the results to new columns in order to further analyze?

I hope my results might be like

ID        POSTIME        LON      LAT     SPEED  AZIMUTH DISTANCE        TD     AD
    1  2015-12-31 23:56:15 123.4003 32.39449   5.2     145     NA    00:00:38  -7
    2  2015-12-31 23:56:53 123.3982 32.39487   5.2     138   201.873 00:03:00   0
    3  2015-12-31 23:59:53 123.3884 32.39625   5.3     138     ...     ...     ... 
    4  2016-01-01 00:01:19 123.3836 32.39702   5.2     146     ...     ...     ...

Is there any package or function will act like this? Or should I just save the results in different vectors and then write to the xlsx file at last?

leerssej
  • 14,260
  • 6
  • 48
  • 57
Lichzeta
  • 11
  • 4
  • You might be looking for `geosphere::distHaversine` and `geosphere::bearing`. – r2evans Nov 05 '17 at 02:40
  • @r2eavns Thanks,I have read about the package, it was just functioned as my own function, but I still have problems with use row1 and row2 as p1 and p2, then row2 and row3, then loop and write results – Lichzeta Nov 05 '17 at 03:07
  • I just answered another question in a way that should help you: https://stackoverflow.com/questions/47115848/how-can-i-calculate-the-distance-between-latitude-and-longitude-along-rows-of-co/47116308#47116308 – r2evans Nov 05 '17 at 04:17

2 Answers2

1

If you're just getting started in R, I'd recommend you checkout the dplyr and tidyr packages for data manipulation. I'm going to use dplyr to help answer your question. I'm going to use a simpler example that gets at what I think is the heart of your question:

how do I calculate a value based on two successive rows of data in my data.frame?

I've used two functions from the dplyr package below:

mutate - which takes a data.frame and transforms it by adding columns. Note I am able to reference new columns I've created in the same mutate command.

lag - this function takes a vector as an argument and returns a shifted copy of the vector. So for example

lag(c(1, 2, 3))
# = NA, 1, 2 

So here's my simple example. I'm going to make some coordinates in the xy-plane and compute the euclidian distance between successive points. I'm going to add columns to my table to bring the coordinates from row i to row i + 1 and then I'll compute the distance using the two sets of coordinates.

#install.packages(dplyr)
library(dplyr)
d <- data.frame(x = c(-1, 2, 0, 0, 2), y = c(-3, -2, -1, 1, 3))
d

#   x  y
#1 -1 -3
#2  2 -2
#3  0 -1
#4  0  1
#5 -2  3

mydist <- function(x1, y1, x2, y2){
  sqrt((x2 - x1)^2 + (y2 - y1)^2)
}

mutate(d, x0 = lag(x), y0 = lag(y), distance = mydist(x0, y0, x, y))

#   x  y x0 y0 distance
#1 -1 -3 NA NA       NA
#2  2 -2 -1 -3 3.162278
#3  0 -1  2 -2 2.236068
#4  0  1  0 -1 2.000000
#5 -2  3  0  1 2.828427
R Thomas
  • 156
  • 6
  • 1
    This is a biased calculation of distance when using lat/lon. It is usually better to use the Haversine (or Vincenty-Haversine) formulas. – r2evans Nov 05 '17 at 04:16
  • To be clear, I'm not advocating using the euclidian distance for the original question asker's purpose. I completely agree with using a formula that takes the curvature of the earth into account. But it seemed to me that there were more fundamental issues of data manipulation in R that the Lichzeta wanted to understand – R Thomas Nov 05 '17 at 04:32
0

Here is a tidyverse and geosphere driven version. If you are a pandas fan or familiar with SQL or just new to R, you will probably find the tidyverse a very comfortable language in which to work.

For the distance calculation, I have used the most precise function available in geosphere. If you are finding your calculations are taking too long, please feel free to back down the complexity to Haversine or lower: options are detailed well here: see Section 2 - Great Circle Distance (p.2)

I have also left the code in a very verbose state. That way you may review all the steps in the process. I just wanted to make sure this answer might be most accessible to you and others who might also have just begun to get excited about the thrilling sport of data wrangling.

The libraries used:

library(tidyverse)
library(lubridate)
library(geosphere)

A replicable dataset transformation of the OP view of the data sample above:

df_dat <- 
    read.table(text = " ID  POSDATE    POSTIME      LON      LAT  SPEED    AZIMUTH
                        1  2015-12-31 23:56:15 123.4003 32.39449   5.2     145
                        2  2015-12-31 23:56:53 123.3982 32.39487   5.2     138
                        3  2015-12-31 23:59:53 123.3884 32.39625   5.3     138
                        4  2016-01-01 00:01:19 123.3836 32.39702   5.2     146
                        5  2016-01-01 00:02:58 123.3788 32.39915   5.1     154
                        6  2016-01-01 00:06:41 123.3708 32.40391   5.1     157
                      ", header = TRUE, stringsAsFactors = FALSE
               )
df_dat

As seen below:

> df_dat
  ID    POSDATE  POSTIME      LON      LAT SPEED AZIMUTH
1  1 2015-12-31 23:56:15 123.4003 32.39449   5.2     145
2  2 2015-12-31 23:56:53 123.3982 32.39487   5.2     138
3  3 2015-12-31 23:59:53 123.3884 32.39625   5.3     138
4  4 2016-01-01 00:01:19 123.3836 32.39702   5.2     146
5  5 2016-01-01 00:02:58 123.3788 32.39915   5.1     154
6  6 2016-01-01 00:06:41 123.3708 32.40391   5.1     157

Below is the code for wrangling your dataframe down into your desired shape. I have also included into the preparation dataframe a column called TD_per that you might find to be a helpful format.

output <- 
df_dat %>%
arrange(ID) %>% 
mutate(DTM = ymd_hms(paste0(POSDATE, POSTIME)),
       LON_prev = lag(LON),
       LAT_prev = lag(LAT),
       AZM_prev = lag(AZIMUTH),
       DTM_prev = lag(DTM),
       TD_sec = difftime(DTM, DTM_prev),
       TD_per = as.period(TD_sec),  # an alternative way to list the times
       AD = AZIMUTH - AZM_prev) %>% 
rowwise %>%     # to keep geosphere on the straight and narrow
mutate(DISTANCE = distVincentyEllipsoid(c(LON_prev, LAT_prev), c(LON, LAT)),
       TD = format(ymd(POSDATE, tz = "UTC") + TD_sec, "%H:%M:%S")
       ) %>% 
select(ID,      # getting dataframe all presentable
       POSTIME = DTM,
       LON,
       LAT,
       SPEED,
       AZIMUTH,
       DISTANCE,
       TD,
       AD)
output
output
Source: local data frame [6 x 9]
Groups: <by row>

# A tibble: 6 x 9
     ID             POSTIME      LON      LAT SPEED AZIMUTH DISTANCE       TD    AD
  <int>              <dttm>    <dbl>    <dbl> <dbl>   <int>    <dbl>    <chr> <int>
1     1 2015-12-31 23:56:15 123.4003 32.39449   5.2     145       NA     <NA>    NA
2     2 2015-12-31 23:56:53 123.3982 32.39487   5.2     138 202.0246 00:00:38    -7
3     3 2015-12-31 23:59:53 123.3884 32.39625   5.3     138 934.6486 00:03:00     0
4     4 2016-01-01 00:01:19 123.3836 32.39702   5.2     146 459.6053 00:01:26     8
5     5 2016-01-01 00:02:58 123.3788 32.39915   5.1     154 509.6387 00:01:39     8
6     6 2016-01-01 00:06:41 123.3708 32.40391   5.1     157 919.2855 00:03:43     3

Finally, you can write your output dataframe directly to a .csv.

write_excel_csv(output, "output.csv")
leerssej
  • 14,260
  • 6
  • 48
  • 57