-4

So, data is a data frame consists of many columns, and one of which called lpep_pickup_datetime has date and time in the format of "01/01/2016 12:39:36 AM"

I want to analyze these data by date and time, so I am trying to create a new column named pickup_date and one named pickup_time with AM or PM information.

I have used the strsplit function to split the string into the following form: c("01/01/2016", "12:29:24", "AM"), and I am trying to create the aforementioned columns out of this data.

I have wrote the following code:

data$lpep_pickup_datetime=strsplit(data$lpep_pickup_datetime, " ")

data$pickup_date=data$lpep_pickup_datetime[[1]][1]


for (i in seq(1,90181))
{
  data$pickup_time[i]=data$lpep_pickup_datetime[[i]][2]  
}

This is gravely inefficient, as it takes too long to iterate through 90181 rows of data. Is there a better way to accomplish this task?

Thanks.

user101998
  • 241
  • 5
  • 15
  • Please show a small reproducible example and expected output? – akrun Feb 17 '17 at 05:22
  • checkout `separate()` and `unite()` from `dplyr` package. – Aramis7d Feb 17 '17 at 05:24
  • 1
    It would be a lot more useful to parse the datetime to a datetime class like POSIXct, after which you can extract parts if you really need. – alistaire Feb 17 '17 at 05:27
  • 1
    Advisable to not play with strings. Instead convert them to Date-time object and extract Date and time separately as shown in the example. – Ronak Shah Feb 17 '17 at 05:28

2 Answers2

1

In base R, we can use sub to create a delimiter and then with read.csv create two columns

data[paste0("pickup_", c("date", "time"))] <- read.csv(text=sub("\\s+", 
         ",", data$lpep_pickup_datetime),  header=FALSE, stringsAsFactors=FALSE)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I think there is a mismatched parenthesis for paste0. Also, it is giving me this error message:"Error in `[.data.table`(x, i, which = TRUE) : When i is a data.table (or character vector), the columns to join by must be specified either using 'on=' argument (see ?data.table) or by keying x (i.e. sorted, and, marked as sorted, see ?setkey). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM." – user101998 Feb 17 '17 at 17:27
  • @user101998 Yes, you are right. Forgot about the `)`.. Fixed it. If your dataset is `data.table`, the syntax is different – akrun Feb 17 '17 at 17:28
1

?apply(df, 2, function(...) ) is the usual way of iterating through columns. But you don't need to do that here

> df<-data.frame("datetime" = format(seq(c(ISOdate(2000,3,20)), by = "day", length.out = 100000), "%Y-%m-%d %r"), stringsAsFactors=FALSE)
> str(df)
'data.frame':   100000 obs. of  1 variable:
 $ datetime: chr  "2000-03-20 08:00:00 PM" "2000-03-21 08:00:00 PM" "2000-03-22 08:00:00 PM" "2000-03-23 08:00:00 PM" ...
> df$dateonly<-format(as.Date(df$datetime, format="%Y-%m-%d %r"),"%Y-%m-%d")
> head(df)
                datetime   dateonly
1 2000-03-20 08:00:00 PM 2000-03-20
2 2000-03-21 08:00:00 PM 2000-03-21
3 2000-03-22 08:00:00 PM 2000-03-22
4 2000-03-23 08:00:00 PM 2000-03-23
5 2000-03-24 08:00:00 PM 2000-03-24
6 2000-03-25 08:00:00 PM 2000-03-25
Jean
  • 1,480
  • 15
  • 27
  • Since my format is of the form `01/01/2016 12:39:36 AM`, I modified your code to `data$pickup_date<-format(as.Date(data$lpep_pickup_datetime, format="%m-%d-%y %r"),"%m-%d-%y")`, but the result is a bunch of NA's. – user101998 Feb 17 '17 at 17:34
  • You were careless. Your format would be `%m/%d/%Y %r`. – Jean Feb 17 '17 at 17:42