R Sorting Data Frame by Date

Question

I'm working on a R data.frame which is made of stocks'dividends per year (I've got 60 stocks in columns and the usual calendar in rows). When a dividend is paid, I've got the figure and otherwise there is a NA.

Basically , here is how my Data.frame looks like

           BARC LN      BARN SE  BAS GY  BATS LN 
1999-01-01      0.26       NA      NA
1999-01-02       NA       0.56     0.35     NA
1999-01-03       NA        NA      NA       NA
2000-01-04       NA        NA      0.40     NA
1999-01-05      0.23      0.28     NA       NA
2001-01-06       NA        NA      NA       NA
2001-01-07      0.85       NA     0.15      NA

I would like to get the amount of dividend paid per year for each stock in order to compute the dividend yield ratio and finally get a Data;frame like the one below :

           BARC LN   BARN SE  BAS GY  BATS LN 
   1999       NA        NA      NA       NA
   2000       NA        NA      NA       NA
   2001       NA        NA      NA       NA

How can i do that?

What have you tried? It looks like your dates are `row.names`? can you use `dput` to post a subset of your actual data. It will make it much easier to give you a correct answer. — Justin, Jun 25 '12 at 14:00
here is what I've tried but it does not provide me with the stocks dividends as I 've only got the dates: years<-as.POSIXlt(Dates)$year+1900 Div[,1][which(years==2000)] — marino89, Jun 25 '12 at 14:16

score 2 · Answer 1 · answered Jun 25 '12 at 14:35

So, assuming your data is in a data.frame like the one you've posted above called div:

div <- structure(list(barc.ln = c(0.26, NA, NA, NA, 0.23, NA, 0.85), 
    barn.se = c(NA, 0.56, NA, NA, 0.28, NA, NA), bas.gy = c(NA, 
    0.35, NA, 0.4, NA, NA, 0.15), bats.ln = c(NA, NA, NA, NA, 
    NA, NA, NA)), .Names = c("barc.ln", "barn.se", "bas.gy", 
"bats.ln"), row.names = c("1999-01-01", "1999-01-02", "1999-01-03", 
"2000-01-04", "1999-01-05", "2001-01-06", "2001-01-07"), class = "data.frame")

just as you've done you can extract the years from your row.names:

div$years <- as.POSIXlt(row.names(div))$year + 1900

The plyr and reshape2 packages work well here and I think make the code particularly clear. Specifically, I'll use melt to make the data long and then ddply to split into groups and sum the dividends:

library(plyr)
library(reshape2)
div.melt <- melt(div, id.vars='years')
div.sum <- ddply(div.melt, 
                 .(years, variable), 
                 summarise, 
                 dividend = sum(value, na.rm=TRUE))

> div.sum
 years variable dividend
1   1999  barc.ln     0.49
2   1999  barn.se     0.84
3   1999   bas.gy     0.35
4   1999  bats.ln     0.00
5   2000  barc.ln     0.00
6   2000  barn.se     0.00
7   2000   bas.gy     0.40
8   2000  bats.ln     0.00
9   2001  barc.ln     0.85
10  2001  barn.se     0.00
11  2001   bas.gy     0.15
12  2001  bats.ln     0.00
>

you can then use another function from reshape2 called cast to format your data "wide":

> dcast(div.sum, years ~ variable, value.var='dividend')
  years barc.ln barn.se bas.gy bats.ln
1  1999    0.49    0.84   0.35       0
2  2000    0.00    0.00   0.40       0
3  2001    0.85    0.00   0.15       0
>

This function does not work : dividends.sum <-ddply(dividends.melt,.(years,variable),summarise,dividend = sum(value,na.rm=TRUE)) and the erro message is : eero in sum(value, na.rm=TRUE) : 'type' (character) incorrect..... — marino89, Jun 25 '12 at 15:12
@user1474263 it works if you use the data I provided. If you use the function `dput` to give us a reproducible copy of your code, then I can help further. The error is suggesting that you are trying to calculate the sum of a character vector, which is, of course, not possible. But there is no way for me to help debug without seeing some of your data in a usable format. — Justin, Jun 25 '12 at 15:17

score 1 · Answer 2 · answered Jun 25 '12 at 16:30

I think you can do this pretty easily using by(). Here's how I did it. I've put each block, along with an explanation below each block.

dividends <- data.frame(barc_ln=c(0.26,NA,NA,NA,0.23,NA,0.85),
                        barn_se=c(NA,0.56,NA,NA,0.28,NA,NA),
                        bas_gy=c(NA,0.35,NA,0.40,NA,NA,0.15),
                        bats_ln=c(NA,NA,NA,NA,NA,NA,NA),
                        row.names=c("1999-01-01","1999-01-02","1999-01-03","2000-01-04","1999-01-05","2001-01-06","2001-01-07"))

This just creates the original data frame you gave.

dividends[,"dates"] <- as.Date(row.names(dividends))
dividends <- dividends[order(dividends[,"dates"]),]
dividends[,"year"] <- format(dividends$dates,"%Y")

This takes the row name dates, and then turns them into a new column ("dates") in the data frame. Then, we order the data frame (not necessarily required, but I find it to be more intuitive) by date and extract the year (as a character, mind you) using format.

div_output <- data.frame(row.names=unique(dividends$year))

Next, I create the output data frame that will receive the data. I use the unique() function on the year variable to get the unique vector of years. They're already ordered (one advantage of ordering the data frame).

for(x in 1:4) {
    div_output[,x] <- by(dividends[,x],INDICES=dividends$year,FUN=sum,na.rm=TRUE)
}
names(div_output) <- names(dividends)[1:4]

Using a simple loop, we just go through each of the columns and apply the by() function. The variable is the column, the indices are the year, and we just use the sum function. I tag on na.rm=TRUE so that instead of NAs, you get actual data.

print(div_output)

     barc_ln barn_se bas_gy bats_ln
1999    0.49    0.84   0.35       0
2000    0.00    0.00   0.40       0
2001    0.85    0.00   0.15       0

And there's the output I get.

R Sorting Data Frame by Date

2 Answers2