It's probably really simple. In the first case, using presidential data, I can filter by either years or years 2. And I get the same result.
However, when I use posixct data and try to filter in a similar way I run into problems.
When I write
school_hours2<-as.character(c(07:18))
I can see the values in school_hours 2 are "7", "8","9" etc whereas in school_hours they are "07" "08" "09" etc
EDIT: I think this explains that difference then? EDIT: I can see the problem comparing integer:character, and even when I write the vector as.character the values in the vector do not match what I want.
What I'd like is to be able to filter by school_hours2. As that would mean I could think "i'd like to filter between these two times" and put the upper and lower bounds in. Rather than having to write all the interval points in between. How do I get this?
Why is filtering by "Y" easier than filtering by "H"?
library (tidyverse)
#some data - filtering works
data(presidential)
head(presidential)
str(presidential)
presidential%>%filter(format(as.Date(start),"%Y")<=2005)
years<-c('1979', '1980', '1981', '1982',
'1983', '1984', '1985', '1986',
'1987', '1988', '1989', '1990'
)
years2<-c(1950:1990)
presidential%>%filter(format(as.Date(start),"%Y")%in% years2)
presidential%>%filter(format(as.Date(start),"%Y")%in% years)
#some date time data - filtering.
test_data<-sample(seq(as.POSIXct('2013/01/01'), as.POSIXct('2017/05/01'), by="day"), 1000)
td<-as.data.frame(test_data)%>%mutate(id = row_number())
school_hours<-c('07', '08', '09', '10',
'11', '12', '13', '14',
'15', '16', '17', '18'
)
school_hours2<-c(07:18)
school_years<-c(2015,2016,2017)
school_years2<-c(2015:2017)
str(td)
test1<-td%>%
filter(id >=79)
schools<-td%>%
filter(format(test_data,'%H') %in% school_hours)
schools2<-td%>%
filter(format(test_data,'%H') %in% school_hours2)
schools3<-td%>%
filter(format(test_data,'%Y')==2017)
schools4<-td%>%
filter(format(test_data,'%Y') %in% school_years)
schools5<-td%>%
filter(format(test_data,'%Y') %in% school_years2)
Here's my question: In the code above, when I try to filter td (which contains posixct data) using school_hours or school_hours2 I get zero data returned. Why?
What I'd like to be able to do is instead of writing
school_hours<-c('07', '08', '09', '10',
'11', '12', '13', '14',
'15', '16', '17', '18'
)
I'd write
school_hours2<-c(07:18)
Just like I have for school_years and the filtering would work. This doesn't work
schools2<-td%>%
filter(format(test_data,'%H') %in% school_hours2)
This does work
schools5<-td%>%
filter(format(test_data,'%Y') %in% school_years2)
WHY?
I ask because: I've used something similar to filter my real data, which I can't share, and I get a discrepancy.
When I use school_hours (which is a character) I generate 993 records and the first time is 07:00. When I use school_hours2 (which is an integer) I generate 895 records and the first time is 10:00. I know - "without the data we can't make any evaluation" but what I can't work out is why the two different vector filters work differently. Is it because school_hours contains characters and school_hours2 integers?
EDIT: I changed the test_data line to:
#some date time data - filtering.
test_data<-as.POSIXct(sample(seq(1127056501, 1127056501), 1000),origin = "1899-12-31",tz="UTC")
it's still problematic:
schools<-td%>%
filter(format(test_data,'%H') %in% school_hours)
generates 510 rows
schools2<-td%>%
filter(format(test_data,'%H') %in% school_hours2)
generates 379 rows
All of the data I'm really interested looks like this 1899-12-31 23:59:00
(where the last 6 digits represent a 24 hr clock time)
All I'm really trying to do is convert the time from this 1899-12-31 07:59:00 to the hour (7)
and then
use
school_hours2<-c(07:18)
as a filter. But will the hour generated by the conversion of 1899-12-31 07:59:00
be 07 or 7
Because if it's 07, then school_hours2<-c(07:18) generates 7 and school_hours2<-as.character(c(07:18)) generates '7'
How do I get around this?
EDIT: LIKE THIS: R: how to filter a timestamp by hour and minute?
td1<-td%>%mutate(timestamp_utc = ymd_hms(test_data,tz="UTC"))%>%
mutate(hour = hour(timestamp_utc))%>%
filter(hour(timestamp_utc) %in% school_hours)
td2<-td%>%mutate(timestamp_utc = ymd_hms(test_data,tz="UTC"))%>%
mutate(hour = hour(timestamp_utc))%>%
filter(hour(timestamp_utc) %in% school_hours2)
td3<-td%>%
mutate(hour = hour(test_data))%>%
filter(hour(test_data) %in% school_hours2)