My data frame has a 10 columns and 100,000 rows, each row is an observation and the columns are data pertaining to each observation. One of the columns has the date of an observation in the julian day(ie feb 4= day 34). I want to reduce my data set so I'd have the first 10% observations PER year PER species. Ie, for species 1 in the year 1901 I want the average day of appearance based on the first 10% of observations.
Example of what I have: note id= species but as a number. ie blue=1
date=c(3,84,98,100,34,76,86...)
species=c(blue,purple,grey,purple,green,pink,pink,white...)
id=c(1,2,3,2,4,5,5,6...)
year=c(1901,2000,1901,1996,1901,2000,1986...)
habitat=c(forest,plain,mountain...)
ect
What i want:
date=c(3,84,76,86...)
species=c(purple,pink,pink, white...)
id=c(2,5,5,6...)
year=c(1901,2000,2000,1986...)
habitat=c(forest,plain,mountain...)
new=c(3,84,79,86...)