0

I have a dataframe called Paperon citations of papers for every year including their publication year as well as some meta stuff (journal, authors). It looks like:

Paper = read.table(textConnection("Meta Publication.Year X1999 X2000 X2001 X2002 X2003
A 1999 0 1 1 1 2
B 2000 0 0 3 1 0
C 2000 0 0 1 0 1
C 2001 0 0 0 1 5
D 1999 0 1 0 2 2"), header = TRUE)

I want to calculate the sum of citations two years after publication and append this list to Paper. However, I am not interested in every year, only those specified in a list Years. My steps (code below) were the following: Order Paper acc. Publication.Year, select Publication.Year and X-rows for first year (i.e. X2000 and X2001 for 1999), calculate sums, bind sums together, cbind to Paper.

Is there (more) elegant way to do this?

Years = as.numeric(c(1999, 2000))
Paper <- Paper[with(Paper, order(Paper[["Publication.Year"]])), ]
Two.Year = as.numeric()
for (i in Years){
Mat <- subset(Paper, Paper[["Publication.Year"]]==i, select=c("Publication.Year", paste("X", i+1, sep=""), paste("X", i+2, sep="")))
temp <- rowSums(Mat[,-1])
Two.Year <- c(Two.Year, temp)
rm(temp)
}
Paper <- cbind(Paper, Two.Year)
rm(Two.Year)
rm(Jahre)
Paper <- subset(Paper, select=c("Meta","Publication.Year","Two.Year")) # Because in the end I only need the citation number
MERose
  • 4,048
  • 7
  • 53
  • 79

1 Answers1

0

Because your years of interest change for each row, you're going to have to create new variables to indicate those years. Then you can use mapply to sum the correct numbers.

Paper$pubYear1 <- paste0("X", as.character(Paper$Publication.Year + 1))
Paper$pubYear2 <- paste0("X", as.character(Paper$Publication.Year + 2))
Paper$pubCount <- mapply(function(r, y1, y2) Paper[r, y1] + Paper[r, y2], 
  row.names(Paper), Paper$pubYear1, Paper$pubYear2)

Here's the resulting data frame:

> Paper
  Meta Publication.Year X1999 X2000 X2001 X2002 X2003 pubYear1 pubYear2 pubCount
1    A             1999     0     1     1     1     2    X2000    X2001        2
2    B             2000     0     0     3     1     0    X2001    X2002        4
3    C             2000     0     0     1     0     1    X2001    X2002        1
4    C             2001     0     0     0     1     5    X2002    X2003        6
5    D             1999     0     1     0     2     2    X2000    X2001        1
rsoren
  • 4,036
  • 3
  • 26
  • 37
  • Storing the names of the columns of interest is clever and way shorter than mine. But I wonder why my approach seems to be much faster: Yours took me 0.9238267 secs and mine only 0.175153 secs. – MERose Aug 03 '14 at 19:40
  • It's probably that the ```mapply``` function has so much indexing work to do for each row. Elegance ≠ speed, at least not in this case. – rsoren Aug 04 '14 at 03:50