I am using the flight dataset that is freely available in R.
flights <- read_csv("http://ucl.ac.uk/~uctqiax/data/flights.csv")
Now, lets say i want to find all flight that have been flying for at least three consecutive years: so there are dates available for three years in the date
column. Basically i am only interested in the year
part of the data.
i was thinking of the following approach: create a unique list of all plane names and then for each plane get all the dates and see if there are three consecutive years.
I started as follows:
NOyears = 3
planes <- unique(flights$plane)
# at least 3 consecutive years
for (plane in planes){
plane = "N576AA"
allyears <- which(flights$plane == plane)
}
but i am stuck here. This whole approach start looking too complicated to me. Is there an easier/faster way? Considering that i am working on a very large dataset...
Note: I want to be able to specify the number of year later on, that is why i included NOyears = 3
in the first place.
EDIT:
I have just noticed this question on SO. Very interesting use of diff
and cumsum
which are both new to me. Maybe a similiar approach is possible here using data.table?