1

I would like to do calculations across columns in my data, by row. The calculations are "moving" in that I would like to know the difference between two numbers in column 1 and 2, then columns 3 and 4, and so on. I have looked at "loops" and "rollapply" functions, but could not figure this out. Below are three options of what was attempted. Only the third option gives me the result I am after, but it is very lengthy code and also does not allow for automation (the input data will be a much larger matrix, so typing out the calculation for each row won't work). Please advice how to make this code shorter and/or any other packages/functions to check out which will do the job. THANK YOU!

MY TEST SCRIPT IN R + errors/results

Sample data set

 a<- c(1,2,3, 4, 5)
 b<- c(1,2,3, 4, 5)
 c<- c(1,2,3, 4, 5)
 test.data <- data.frame(cbind(a,b*2,c*10))
 names(test.data) <- c("a", "b", "c")

Sample of calculations attempted:

OPTION 1

require(zoo)
rollapply(test.data, 2, diff, fill = NA, align = "right", by.column=FALSE)

RESULT 1 (not what we're after. What we need is at the bottom of Option 3)

#     a  b  c
#[1,] NA NA NA
#[2,]  1  2 10
#[3,]  1  2 10
#[4,]  1  2 10
#[5,]  1  2 10

OPTION 2:

results <- for (i in 1:length(nrow(test.data))) {
diff(as.numeric(test.data[i,]), lag=1)
print(results)}

RESULT 2: (again not what we're after)

# NULL

OPTION 3: works, but long way, so would like to simplify code and make generic for any length of observations in my dataframe and any number of columns (i.e. more than 3). I would like to "automate" the steps below, if know number of observations (i.e. rows).

row1=diff(as.numeric(test[1,], lag=1))
row2=diff(as.numeric(test[2,], lag=1))
row3=diff(as.numeric(test[3,], lag=1))
row4=diff(as.numeric(test[4,], lag=1))
row5=diff(as.numeric(test[5,], lag=1))

results.OK=cbind.data.frame(row1, row2, row3, row4, row5)
transpose.results.OK=data.frame(t(as.matrix(results.OK)))
names(transpose.results.OK)=c("diff.ab", "diff.bc")
Final.data = transpose.results.OK
print(Final.data)

RESULT 3: (THIS IS WHAT I WOULD LIKE TO GET, "row1" can be "obs1" etc)

#     diff.ab diff.bc
#row1       1       8
#row2       2      16
#row3       3      24
#row4       4      32
#row5       5      40

THE END

Community
  • 1
  • 1
Mary
  • 41
  • 5

2 Answers2

3

Here are the 3 options redone plus a 4th option:

# 1
library(zoo)
d <- t(rollapplyr(t(test.data), 2, diff, by.column = FALSE))

# 2
d <- test.data[-1]
for (i in 1:nrow(test.data)) d[i, ] <- diff(unlist(test.data[i, ]))

# 3
d <- t(diff(t(test.data)))

# 4 - also this works
nc <- ncol(test.data)
d <- test.data[-1] - test.data[-nc]

For any of them to set the names:

colnames(d) <- paste0("diff.", head(names(test.data), -1), colnames(d))

(2) and (4) give this data.frame and (1) and (3) give the corresponding matrix:

> d
  diff.ab diff.bc
1       1       8
2       2      16
3       3      24
4       4      32
5       5      40

Use as.matrix or as.data.frame if you want the other.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

An apply based solution using diff on row-wise can be achieved as:

# Result
res <- t(apply(test.data, 1, diff)) #One can change it to data.frame

# Name of the columns
colnames(res) <-  paste0("diff.", head(names(test.data), -1),
                               tail(names(test.data), -1))

res
#      diff.ab diff.bc
# [1,]       1       8
# [2,]       2      16
# [3,]       3      24
# [4,]       4      32
# [5,]       5      40
MKR
  • 19,739
  • 4
  • 23
  • 33