1

I am new to R and have been stuck with a problem for quite a while now ... I have a big dataset(gridded data originally) with more than 1,000,000 observations and have to make a group variable for my elements. My dataset looks like follows:

ID        Var1
1         0,5 
2         0,6 
3         0,2 
4         0,15
...       ... 
1029600   0,43

What I want now is to make groups according to the following scheme:

1       2       3       4       5       6      ...   4320
4321    4322    4322    4322    4322    4322   ...   8640
8641    8642    8643    8644    8645    8646   ...   12960
12961    12962  12963   12964   12965   12966  ...   17280
17281   17282   17283   17284   17285   17286  ...   21600
21601   21602   21603   21604   21605   21606  ...   25920
...      ...     ...    ...     ...     ...    ...    ...
1025281 1025282 1025283 1025284 1025285 1025286...   1029600

Where the 36 numbers {1,2,3,4,5,6,4321,4322,4323,4324,4325,4326,8641,8642,...,21060} are the first group . The second group would be {7,8,9,10,11,12,4327,4328,...,21612}. The third group would start with {13,14,15...}. And so on for all observations. I hope i could make it clear what my goal is here. I wanted to visualize it with a picture, but as a new member, this is not possible.

So far i managed to do it with a really ugly loop function, which looks as follows:

for(k in 0:40) { 
    nk <- 25920 * k
    mk <- 720 * k
    for (j in 0:719) {
        cj <- j * 6
        for (i in 0:5) { 
            ai <- i * 4320 + 1 + cj + nk
            bi <- i * 4320 + 6 + cj + nk
            group[ai:bi] <- 1 + j + mk
        }
    }
} 

I am aware that this is pretty inefficient and it takes a very long time to compute this with loops. I am pretty sure that there is an easier way to solve my problem, but as I am new to R, I cannot find it myself.

Any help would be really appreciated. Thank you in advance!

tguzella
  • 1,441
  • 1
  • 12
  • 15
  • I am confused. To clarify I understood the question correctly: do you want to add a grouping factor to your dataframe with consecutive IDs, based on that IDs position in a submatrix in a matrix? – Heroka Aug 21 '15 at 15:43
  • Your question is not very clear, but it sounds like you want something similar to what was asked for [here](http://stackoverflow.com/questions/24299171/function-to-split-a-matrix-into-sub-matrices-in-r) – A5C1D2H2I1M1N2O1R2T1 Aug 21 '15 at 15:44

2 Answers2

3

You can get the group from the ID with a simple formula:

group <- (((ID-1) %% 4320) %/% 6) +1

Note that %% is the modulo operation and %/% is the integer division. The formula should give you groups numbered from 1. No need to include it in a loop, it is a vectorized operation.

There are plenty of ways to do it (like reshaping 1:1029600 into a matrix with 4320 columns and taking the 6*N:6*(N+1) columns and do a match or something) but this is why you should always stop and think about what, really, you want to do. And realize it comes down to a little arithmetic :)

asachet
  • 6,620
  • 2
  • 30
  • 74
0

Create sample data

dtf <- data.frame(ID = 1:1e4, Var1 = rnorm(1:1e4))

Grouping as explained by @antine-sac:

group <- (((dtf$ID-1) %% 4320) %/% 6) +1

Split the data

dtfsplit <- split(dtf, group)

First group

> dtfsplit[1]
$`1`
       ID     Var1
1       1  0.56655
2       2  0.87645
3       3 -1.41986
4       4 -1.84881
5       5  0.03233
6       6  3.06512
4321 4321 -1.57179
4322 4322 -1.09958
4323 4323  0.55980
4324 4324  0.32390
4325 4325  0.85438
4326 4326 -0.10311
8641 8641  2.08886
8642 8642  1.19836
8643 8643  0.52592
8644 8644  0.20571
8645 8645  1.08429
8646 8646  0.69648

Second group

dtfsplit[2]
Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110