4

I have a dataframe data in R of dim 120000 rows by 5 columns.

Each 300 lines is a frame measured at different time intervals (ie 400 frames)

Action

I tried using array(data, c(300, 5, 400))

Expected

Make this dataframe into a 3d array by splitting data every 300 lines and stack these 400 matrices behind each other.

Actual

Reads the values down along the first column of data and puts these into the first part of the array.

JustinJDavies
  • 2,663
  • 4
  • 30
  • 52
T_stats_3
  • 145
  • 3
  • 11

2 Answers2

6

Here's an approach using dim<- and aperm:

Sample data:

set.seed(1)
mat <- matrix(sample(100, 12 * 5, TRUE), ncol = 5)
mat
#       [,1] [,2] [,3] [,4] [,5]
#  [1,]   27   69   27   80   74
#  [2,]   38   39   39   11   70
#  [3,]   58   77    2   73   48
#  [4,]   91   50   39   42   87
#  [5,]   21   72   87   83   44
#  [6,]   90  100   35   65   25
#  [7,]   95   39   49   79    8
#  [8,]   67   78   60   56   10
#  [9,]   63   94   50   53   32
# [10,]    7   22   19   79   52
# [11,]   21   66   83    3   67
# [12,]   18   13   67   48   41

Slicing and dicing:

Sliced <- aperm(`dim<-`(t(mat), list(5, 3, 4)), c(2, 1, 3))

Sliced
# , , 1
# 
#      [,1] [,2] [,3] [,4] [,5]
# [1,]   27   69   27   80   74
# [2,]   38   39   39   11   70
# [3,]   58   77    2   73   48
# 
# , , 2
# 
#      [,1] [,2] [,3] [,4] [,5]
# [1,]   91   50   39   42   87
# [2,]   21   72   87   83   44
# [3,]   90  100   35   65   25
# 
# , , 3
# 
#      [,1] [,2] [,3] [,4] [,5]
# [1,]   95   39   49   79    8
# [2,]   67   78   60   56   10
# [3,]   63   94   50   53   32
# 
# , , 4
# 
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    7   22   19   79   52
# [2,]   21   66   83    3   67
# [3,]   18   13   67   48   41

Adjust the numbers to match your data.


Breaking things apart, we get:

  • t(mat): transposes your matrix (so we now have 5 x 12).
  • dim<-(..., list(...)): converts this to an array, in this case, 5 (row) x 3 (col) x 4 (third dimension).
  • aperm: the result of the last step is by-row, so we need to convert it to by columns, so this is like a t, but with multiple dimensions involved.

These are also very efficient operations. Here's a comparison of this approach with @akrun's:

m1 <- matrix(1:(300*400*5), nrow=300*400, ncol=5)

am <- function() {
  aperm(`dim<-`(t(m1), list(5, 300, 400)), c(2, 1, 3))
}

ak <- function() {
  lst <- lapply(split(seq_len(nrow(m1)),(seq_len(nrow(m1))-1) %/%300 +1),
                function(i) m1[i,])

  arr1 <- array(0, dim=c(300,5,400))
  for(i in 1:400){
    arr1[,,i] <- lst[[i]]
  }
  arr1
}

library(microbenchmark)
microbenchmark(am(), ak(), times = 20)
# Unit: milliseconds
#  expr       min        lq    median        uq      max neval
#  am()  19.09133  27.63269  31.18292  67.12434 146.2673    20
#  ak() 496.11494 518.71223 550.02215 591.27266 699.9834    20
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
6

Another option would be:

 m1 <- matrix(1:(300*400*5), nrow=300*400, ncol=5)
 lst <- lapply(split(seq_len(nrow(m1)),(seq_len(nrow(m1))-1) %/%300 +1),
                         function(i) m1[i,])

 arr1 <- array(0, dim=c(300,5,400))
 for(i in 1:400){
 arr1[,,i] <- lst[[i]]
 }

m1[297:300,]
#     [,1]   [,2]   [,3]   [,4]   [,5]
#[1,]  297 120297 240297 360297 480297
#[2,]  298 120298 240298 360298 480298
#[3,]  299 120299 240299 360299 480299
#[4,]  300 120300 240300 360300 480300

 tail(arr1[,,1],4)
 #      [,1]   [,2]   [,3]   [,4]   [,5]
 #[297,]  297 120297 240297 360297 480297
 #[298,]  298 120298 240298 360298 480298
 #[299,]  299 120299 240299 360299 480299
 #[300,]  300 120300 240300 360300 480300

Or as suggested by @Ananda Mahto

library(abind)
arr2 <-  abind(lapply(split(seq_len(nrow(m1)), 
           (seq_len(nrow(m1))-1) %/% 300 + 1), function(x) m1[x, ]), along = 3)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 4
    Slightly slower, but less manual work, would be `abind(lapply(split(seq_len(nrow(m1)), (seq_len(nrow(m1))-1) %/% 300 + 1), function(x) m1[x, ]), along = 3)` (where `abind` is from the "abind" package). +1. – A5C1D2H2I1M1N2O1R2T1 Sep 29 '14 at 19:33
  • @Ananda Mahto Thanks I thought about `abind`, then I was in a mood to play with the `for` loop. Also, because `aperm` based on your example was not getting the expected result. I was using the wrong order like `list(300,5,400)` instead of `list(5,300,400)`. :-) – akrun Sep 29 '14 at 19:39