2

I have an issue in parsing a dataset, i read the dplyr vignette and would like to simulate my issue using the flights data set.

Now lets say i want to parse this data set with the following conditions;

In all there are 510 unique air_time in this data set; lets say i have a set of 200 different air_time values with me, and would like to parse the dataset such that i would get only those cases where the air_time value matches one of the 200 different values that i have.

i tried this;

my200airtime <- head(flights$air_time,200)
filter(flights,air_time==my200airtime)
#this gave me a dataframe with 1458 different rows but thats too low and is for sure not correct due to the mistake i am making.

In plain english i would say the task is to parse the data such that it should contain all observations where the value in air_time column matches one of the 200 air_time values.

If i had to match it to just 5 different values i would have used a code like this

filter(flights,air_time==1|2|3|4|5)

but here i have 200 different air times and hence need some different approach,please suggest

answer: so based on the two suggestions below the answer would be this

#using the filter function in dplyr
library(nycflights13)
my200airtime <- data.frame(head(flights$air_time,200))
colnames(my200airtime) <- "air_time"
parsed_data1 <- filter(flights,air_time %in% my200airtime$air_time)
dim(parsed_data1)
#[1] 140827     16

#using semi_join function in dplyr
parsed_data2 <- semi_join(flights,my200airtime)
dim(parsed_data2)
#[1] 140827     16
vinay
  • 57
  • 1
  • 12

0 Answers0