I have a series of coordinates for fish caught from a boat at different datetimes and different trips. How do I determine whether the coordinates of a fish are likely to be incorrect (e.g. due to transcription error) based on time since last fish caught within that same trip and an assumed boat speed (say 10km/hour).
Here is a simple example dataset with 2 trips and two fish per trip.
library(sf)
library(ggplot2)
library(dplyr)
library(lubridate)
datetime <- ymd_hms('2017-05-13 14:00:00', tz = "Etc/GMT+8")
df <- data_frame(DateTimeCapture = c(datetime, datetime + minutes(35), datetime + days(2),
datetime + days(2) + minutes(20)),
Trip = c('1', '1', '2', '2'),
Order = c(1, 2, 1, 2),
X = c(648635, 648700, 647778, 658889),
Y = c(5853151, 5853200, 5854292, 5870000))
# if you prefer to work in sf
df_sf <- st_as_sf(df, coords = c('X', 'Y'), crs = 32610)
# quick plot
ggplot() +
geom_point(data = df, aes(x = X, y = Y, color = Trip))
The distance between the two fish in the second trip is 19km:
st_distance(df_sf[3:4, ])
Units: m
[,1] [,2]
[1,] 0.00 19240.47
[2,] 19240.47 0.00
It is unlikely that a boat could travel 19km in 20 minutes. Thus this should be flagged as a possible error.
My preference is for solutions using sf, but may also accept solutions using sp. It has to be r-based solution.