I have a large raster data (X) with a dimension of 32251*51333. The values of X are repetitions of another array (Y), which has a size of 3*10^6. Now I want to change the values of X by matching it against each value of Y, for example I can program like this,
for (i in 1:length(Y)){
X[X==Y[i]] = Z[i] #Z is just another array with the same size as Y
}
The problem is that, first the index matching X[X==Y[i]] = Z[i]
does not work because X is too large. After a few minutes the program just stops by giving an error "Error: cannot allocate vector of size 6.2 Gb".
Second, going over the loops from 1 to length(Y), even though Y is of size 10^6, may take "forever" to complete.
One approach came to my mind is to split X into small chunks and then do the index match for each chunk. But I feel this would still take a lot of time.
Is there a better way to achieve the above goal?
1st Update:
Thanks to the example provided by @Lyngbakr, I will elaborate this question further. Because the raster I'm working with is very large (32251*51333), it seems not possible to upload it. The example given by @Lyngbakr is very similar to what I want, except that the raster created is too small. Now following the example, I ran two tests by generating a much larger raster with dimension of 3000*2700. See code below.
#Method 1: Use subs
start_time <- Sys.time()
Y <- 1:9
Z <- 91:99
X <- raster(matrix(rep(Y, 3), nrow=3000,ncol = 2700))
df <- data.frame(Y, Z)
X <- subs(X, df)
end_time <- Sys.time()
end_time - start_time
#Time difference of 2.248908 mins
#Method 2: Use for loop
start_time <- Sys.time()
Y <- 1:9
Z <- 91:99
X <- raster(matrix(rep(Y, 3), nrow=3000,ncol = 2700))
for (i in 1:length(Y)){
X[X==Y[i]]=Z[i] #this indexing of R seems not efficient if X becomes large
}
end_time <- Sys.time()
end_time - start_time
#Time difference of 10.22717 secs
As you can see, a simple for loop is even more efficient than the subs function. Remember, the raster shown in the example is still smaller than what I work with (about an order of 100 smaller). Also, the array Y in the example is very small. Now the question could be, how to speed up the Method 2, which is just a simple for loop?