The iris dataset contains 150 observations of three plant species (setosa, versicolor and virginica), being 50 observations of each species. I would like to create a new dataframe, called "a", containing only the observations of two of these species (setosa and versicolor). I have been trying to use the codes below to do this, but these codes apparently do some sort of cycling; that is, instead of returning observations from 1 to 100 (which is what I'd like), they return observations 1, 3, 5, 7, ..., 100.
data(iris)
a <- subset(iris, Species == c("setosa", "versicolor"))
or
a <- iris[iris$Species == c("setosa","versicolor"),]
I would be grateful if anyone can help me figure out what I am doing wrong. I am aware that there are much simpler ways to get the dataframe I want (e.g., the codes listed below), but I would really like to understand why the above codes do not work — I want to apply this to more complex datasets where I have to extract many species and I would like to extract them by calling them by name.
a <- iris[1:100,] # this returns the dataframe I want
# or
a <- subset(iris, Species != "virginica") # this returns the dataframe I want as well