0

I've got a GPS dataset with about 5600 rows of coordinates from 5 GPS devices ('nodes') over several days and I want to reduce the number of GPS points to just one point per hour. Because the number of points per hour fluctuates, a simple for-loop is not possible. A simplified structure of the table would be this:

ID  node  easting  northing  year  month  day  hour  minute  time

The column 'time' is class "POSIXlt" "POSIXt". Trying my first approach, a multiple nested for-loop, I learned about the Second circle of Inferno.

Does someone has any idea, how to reduce multiple rows (per hour) to one (per hour), separated by each device in R.

sequoia
  • 480
  • 2
  • 13

3 Answers3

2

Assuming that the year, month, day, and time columns contain information related to the time column, the solution could be as follows:

# Generate data

md <- data.frame(
  node = rep(1:5, each = 2)
  , easting = sample(1:10, size = 20, replace = TRUE)
  , northing = sample(1:10, size = 20, replace = TRUE)
  , year =  2017
  , month = "June "
  , day = 6
  , hour = rep(1:2, each = 2, times = 5)
  , minute = NA
  , time = NA
)

# Solution

library(dplyr)

md %>% 
  group_by(node, year, month, day, hour) %>%
  summarize(
    easting = mean(easting),
    northing = mean(northing)
  )
  • I think this approach is generally good, except that averaging the easting and northing could give you very misleading data. If someonw walked in a curve, the average point would be far where the person actually walked. – John Paul Jun 21 '17 at 12:53
  • you're right, in that case what can be done is to replace the `mean ()` function with any other function that uses a more suitable calculation – Alvaro Franquet Sep 21 '20 at 11:35
1

You can create a new column "Unix_hour": the UNIX timestamp divided by 3600.

So, you will have a unique id for each hour.

To do this, you should user as.numeric to convert a POSIXct date into Unix timestamp (in seconds):

as.numeric(POSIXct_variable) / 3600

It will return the timestamp.

Then, you will just group by on this new column "Unix_hour":

aggregate(. ~ Unix_hour, df, mean)

(Change aggregate function "mean" if you to aggregate other variables in another way)

Olivier
  • 432
  • 1
  • 3
  • 10
0

You could convert your multi columns for date time into one, e.g:

DateTimeUTCmin5 <- ISOdate(year = tmp$Year,
month = tmp$Month,
day = tmp$Day,
hour = tmp$Hour,
min = tmp$Min,
sec = tmp$Sec,
tz = "America/New_York")

add an hour floor using floor_date from lubridate

df$HourFloor = floor_date(df$DateTimeUTCmin5, unit = "hour")

then decide how you want to extract the data from that hour, mean, first, max?

Hourstats <- df %>% group_by(HourFloor) %>%
summarise(meanEast = mean(easting, na.rm = TRUE),
          firstNorth = first(northing, na.rm = TRUE))) %>%
ungroup()
dez93_2000
  • 1,730
  • 2
  • 23
  • 34