0

I'm trying to find the mean forward return (column fwd_rtn) of each quartile for each column (ie for quartiles for PB, PE, PS) for each date group (1/1/2016...1/4/2016)

head(df)

Date Stock Price PB PE PS fwd_rtn
1 1/1/2016 A 11.90 0.4 0.10 0.57 -0.015
2 1/1/2016 B 3.56 0.8 0.09 0.26 -0.036
3 1/1/2016 C 1.29 1.2 0.18 1.60 0.10
......
4 1/4/2016 A 12.80 0.39 0.13 0.53 -0.01
5 1/4/2016 B 4.03 0.76 0.08 0.23 0.02
6 1/4/2016 C 1.83 0.87 0.14 1.16 0.03

So far i have been able to find the mean return for 1 column for 1 date using this code

df$qPB <- cut(df$PB, breaks = quantile(df$PB, c(0,.25,.5,.75,1)),include.lowest = TRUE)  

aggregate(df$fwd_rtn,list(qPB = df$qPB),FUN=mean)

which gave me the right answers. But I'm struggling to do it for the multiple columns. I think I'm supposed to use dplyr and the gather() function but i dont know how.

lmo
  • 37,904
  • 9
  • 56
  • 69
  • Can anyone please drop me some crumbs/point me to a general direction as to how I can solve this please? Thank you so much for the help! – richard_jokes Apr 14 '16 at 14:15
  • Can anyone please help me how to loop through using the apply function? Im very confused because of the finding quantile part..and then the aggregate. Its two parts so i should use the apply function on them both separately? How do i link the 2 parts? Any help is appreciated. Thanks everyone – richard_jokes Apr 15 '16 at 11:27

1 Answers1

0

To get quartiles of a single variable by date you can use the ave function:

df$qPB <- ave(df$PB, df$Date, FUN= function(i) cut(i, breaks = quantile(df$PB, 
                                          c(0,.25,.5,.75,1)),include.lowest = TRUE)
# a minor addition to aggregate
aggregate(df$fwd_rtn, list("qPB"=df$qPB, "date"=df$Date), FUN=mean)

You should take a look at using lapply or sapply to move through multiple columns.

lmo
  • 37,904
  • 9
  • 56
  • 69
  • Thanks lots lmo. It gave me an error at first but it was ok after i changed "date" = Date to "date"=df$Date. Next is how to use it for the multiple columns. Will attempt to use the lapply or sapply suggested. – richard_jokes Apr 15 '16 at 03:30
  • sorry lmo i still cant get how to do it using lapply. but thats ok because I have been doing it individually (with the goal of cbinding the results at the end) ie for PS i did: df$qPS <- cut(df$PS, breaks = quantile(df$PS, c(0,.25,.5,.75,1), na.rm=TRUE),include.lowest = TRUE). Note that ive added a na.rm=TRUE, because there are some NaNs in PS. Wont this pose a problem when i cbind the results after going through all the columns? – richard_jokes Apr 15 '16 at 06:50