0

I want to go through several steps using a df which contains POSIXct datapoints.

Essentially there are three columns in a dataframe which have different dates. The following needs to be achieved:

  1. change all dates to be the same for each row of the three columns (leave times untouched

  2. calculate difference in time between actual time in the column/row against a nominal date/time combination which yields three new columns with seconds

I have done this successfully but my answer (which I already sought help on) seems too long and cumbersome, here it is:

The first thing I did was to create a nominal date to use in calculations:

date.zero<- as.POSIXct("2018-01-01 00:00:00 EST")

I then changed all dates in each row of the data frame within the specific columns to the same date

df$tim.col.1 <- as.POSIXct(sub("\\S+", "2018-01-01", df$tim.col.1))
df$tim.col.2 <- as.POSIXct(sub("\\S+", "2018-01-01", df$tim.col.2))
df$tim.col.2 <- as.POSIXct(sub("\\S+", "2018-01-01", df$tim.col.2))

Lastly I used lapply to subtract the dates from the date.zero to yield time difference in seconds (i.e. essentially seconds from 00:00:00)

df["tim.col.1"] <- lapply(df["tim.col.1"],function(x) x-date.zero)
df["tim.col.2"] <- lapply(df["tim.col.2"],function(x) x-date.zero)
df["tim.col.3"] <- lapply(df["tim.col.3"],function(x) x-date.zero)

Now. I'm guessing that all of this can easily be either done using lapply in a better fashion or using dplyr so I don't need to type all this code...using something like this perhaps but integrating everything together?

newdf  <- df %>% rowwise () %>% mutate(xxx=tim.col.1-date.zero,
                                  xxx2=tim.col.2-date.zero,
                                  xxx3=tim.col.3-date.zero)

Can someone enlighten me as to how this would be achieved most succinctly and efficiently.

TylerH
  • 20,799
  • 66
  • 75
  • 101
Dasr
  • 777
  • 6
  • 16
  • I'm voting to close this question as off-topic because questions asking for improvements to working code belong on Code Review, not Stack Overflow. – TylerH Feb 28 '18 at 14:51

1 Answers1

2

Here is a dplyr solution to the problem you described:

library(magrittr)
library(dplyr)
library(stringr)
library(lubridate)

date.zero<- ymd_hms("2018-01-01 00:00:00", tz = "America/New_York")

new_df <- df %>% # 1) change all dates to be the same for each row of the three columns
    mutate(tim.col.1 = ymd_hms(str_replace(tim.col.1, "\\S+", "2018-01-01"), tz = "America/New_York"),
          tim.col.2 = ymd_hms(str_replace(tim.col.2, "\\S+", "2018-01-01"), tz = "America/New_York"),
          tim.col.3 = ymd_hms(str_replace(tim.col.3, "\\S+", "2018-01-01"), tz = "America/New_York")) %>%
    # 2) calculate difference in time between actual time in the column/row against a 
    # nominal date/time combination which yields three new columns with seconds
    mutate(tim.col.1 = tim.col.1 - date.zero,
           tim.col.2 = tim.col.2 - date.zero,
           tim.col.3 = tim.col.3 - date.zero)

Edit: And here is the mutate_if version based on Moody_Mudskipper's suggestion:

new_df <- df %>% # 1) change all dates to be the same for each row of the three columns
    mutate_if(is.POSIXct, funs(ymd_hms(str_replace(., "\\S+", "2018-01-01"), tz = "America/New_York"))) %>%
    # 2) calculate difference in time between actual time in the column/row against a 
    # nominal date/time combination which yields three new columns with seconds
    mutate_if(is.POSIXct, funs(. - date.zero))
Nick Holt
  • 116
  • 6
  • 1
    maybe use mutate_at or mutate_all if you're going to apply the same function on several columns – moodymudskipper Feb 11 '18 at 16:17
  • 1
    Thanks for that. The second seems to work fine and is parsimonious with writing. However, could you just explain the bit: ymd_hms(str_replace(., I'm struggling to understand it. What is the period doing and why is this bit needed. Sorry for my ignorance :) – Dasr Feb 11 '18 at 21:58
  • ymd_hms from the lubridate package converts the values to a year-month-day_hours-minutes-seconds format. the str_replace function will look in a given character vector and replace anything matching the regular expression "\\S+" with the specified replacement "2018-01-01". The '.' in the str_replace function specifies that the vector to run the replace on will be the data in each column that is passed by mutate_if(). Basically, we are taking every POSIXct column and passing it into the str_replace function, and then formatting the output as a date instead of a string. – Nick Holt Feb 12 '18 at 17:28