I have an issue in parsing a dataset, i read the dplyr vignette and would like to simulate my issue using the flights data set.
Now lets say i want to parse this data set with the following conditions;
In all there are 510 unique air_time in this data set; lets say i have a set of 200 different air_time values with me, and would like to parse the dataset such that i would get only those cases where the air_time value matches one of the 200 different values that i have.
i tried this;
my200airtime <- head(flights$air_time,200)
filter(flights,air_time==my200airtime)
#this gave me a dataframe with 1458 different rows but thats too low and is for sure not correct due to the mistake i am making.
In plain english i would say the task is to parse the data such that it should contain all observations where the value in air_time column matches one of the 200 air_time values.
If i had to match it to just 5 different values i would have used a code like this
filter(flights,air_time==1|2|3|4|5)
but here i have 200 different air times and hence need some different approach,please suggest
answer: so based on the two suggestions below the answer would be this
#using the filter function in dplyr
library(nycflights13)
my200airtime <- data.frame(head(flights$air_time,200))
colnames(my200airtime) <- "air_time"
parsed_data1 <- filter(flights,air_time %in% my200airtime$air_time)
dim(parsed_data1)
#[1] 140827 16
#using semi_join function in dplyr
parsed_data2 <- semi_join(flights,my200airtime)
dim(parsed_data2)
#[1] 140827 16