I have a large dataframe My_Data
, which contains a few thousand names. I am trying to subset the data frame using a vector of names Names.rm
but I keep getting a dataframe returned with 0 rows (despite the names being present in My_Data).
These are what I have tried:
My_Data[My_Data$Author_name %in% Names.rm, ]
subset(My_Data, Author_name %in% Names.rm)
EDIT:
Sorry I'm not sure of the proper way to format data but I'll try and give a sample:
My_Data
:
Author Time.period Gender
8 AERTS R Rien ECOLOGY 2001-2005 M
10 AGRAWAL AA Anurag ECOLOGY 2001-2005 M
12 AINSLIE G George NEUROSCIENCES 2001-2005 M
73 BLOB RW Richard ZOOLOGY 2001-2005 M
Names.rm
:
1 AERTS R Rien ECOLOGY
2 BLOB RW Richard ZOOLOGY
Code used: My_Data[My_Data$Author %in% Names.rm, ]
Expected output:
Author Time.period Gender
8 AERTS R Rien ECOLOGY 2001-2005 M
73 BLOB RW Richard ZOOLOGY 2001-2005 M
Actual output (when tried with whole dataframe):
[1] Author Time.period Gender
<0 rows> (or 0-length row.names)
EDIT 2: OK so it worked there with that subset of the data, but it isn't working when I try and do it on my whole dataset. Is there a limit to the size of the dataset you can do this for?
I have read: Selecting columns in R data frame based on those *not* in a vector and Select rows from a data frame based on values in a vector