I have a sample dataframe "data" as follows:
X Y Month Year income
2281205 228120 3 2011 1000
2281212 228121 9 2010 1100
2281213 228121 12 2010 900
2281214 228121 3 2011 9000
2281222 228122 6 2010 1111
2281223 228122 9 2010 3000
2281224 228122 12 2010 1889
2281225 228122 3 2011 778
2281243 228124 12 2010 1111
2281244 228124 3 2011 200
2281282 228128 9 2010 7889
2281283 228128 12 2010 2900
2281284 228128 3 2011 3400
2281302 228130 9 2010 1200
2281303 228130 12 2010 2000
2281304 228130 3 2011 1900
2281352 228135 9 2010 2300
2281353 228135 12 2010 1333
2281354 228135 3 2011 2340
I use the ddply to compute the income for each Y
x <- ddply(data, .(Y), summarize, freq=length(Y), tot=sum(income))
#Now, I also need to find out the X for each Y depending upon the following conditions:
a. If Y consists of observations of months 9 (2010), 12 (2010), and 3 (2011), then the x corresponds to months 9(2010) i.e. for Y =228121 x=2281212
b. If Y consists of observations of month 6 (2010), 9 (2010), 12(2010) , and 3 (2011) then the x corresponds to months 6 (2010) i.e. for Y =228122 x=2281222.
c. If Y consists of observations of month 12 (2010), 3 (2011) then the x corresponds to months 12 (2010) i.e. for Y =228124 x=2281243.
d. If Y consists of observations of month 12 (2010), 3 (2011) then the x corresponds to months 12 (2010) i.e. for Y =228124 x=2281243.
e. If Y consists of only one observation then the x corresponds to month of that observation i.e. for Y =228120 x=2281205.
The point here is if I have more than one observation for each Y, that I am choosing x corresponding to month 6 (2010) if available, but if that is not available I choose months close to 6 (2010) (eg. 9 (2010)). Note that if I have only one observation, I will choose x for that observation.
Please suggest how to incorporate these conditions in ddply.