0

I try to subset my dataset using a nested loop. Unfortunately, it does not seem to work out properly: I get a couple of warnings and the loop is also not working as I would wish.

Here a short code example. The presented data is just an example - the actual dataset is much bigger: Any solution that involves manually picking values is not feasible.

# #Generate example data
unique_test <- list()
unique_test[[1]] <- c(178.5, 179.5, 180.5, 181.5)
unique_test[[2]] <- c(269.5, 270.5, 271.5)



tmp_dataframe1 <- data.frame(myID = c(268, 305, 268, 305, 268, 305, 306), 
                            myvalue = c(1.150343, 2.830392, 1.150343, 2.830392, 1.150343, 2.830392, 1.150343), 
                            myInter = c(178.5, 178.5, 179.5, 179.5, 180.5, 180.5, 181.5))

tmp_dataframe2 <- data.frame(myID = c(144, 188, 196, 300, 301, 302, 303, 97), 
                             myvalue = c(1.293493, 3.286649, 1.408049, 0.469219, 11.143147, 0.687355, 0.508603, 0.654335), 
                             myInter = c(269.5, 269.5, 269.5, 270.5, 270.5, 271.5, 185.5, 186.5))



mydata <- list()
mydata[[1]] <- tmp_dataframe1
mydata[[2]] <- tmp_dataframe2
########################

# #Generate nested loop
mysubset <- list() #Define list

for(i in 1:length(unique_test)){
  #Prepare list of lists
  mysubset[[i]] <- NaN
  for(j in 1:length(unique_test[[i]])){
    #Select myvalues whose myInter data equals the one found in unique_test and assign them to a new subset
    mysubset[[i]][j] <- mydata[[i]][which(mydata[[i]]$myInter == unique_test[[i]][j]),][["myvalue"]]
  }
}

# #There are warnings and the nested loop is not really doing, what it is supposed to do!

R gives the following warnings:

Warning messages:
1: In mysubset[[i]][j] <- mydata[[i]][which(mydata[[i]]$myInter ==  :
  number of items to replace is not a multiple of replacement length
2: In mysubset[[i]][j] <- mydata[[i]][which(mydata[[i]]$myInter ==  :
  number of items to replace is not a multiple of replacement length
3: In mysubset[[i]][j] <- mydata[[i]][which(mydata[[i]]$myInter ==  :
  number of items to replace is not a multiple of replacement length
4: In mysubset[[i]][j] <- mydata[[i]][which(mydata[[i]]$myInter ==  :
  number of items to replace is not a multiple of replacement length
5: In mysubset[[i]][j] <- mydata[[i]][which(mydata[[i]]$myInter ==  :
  number of items to replace is not a multiple of replacement length

If I restrict myself to just the first element in my dataset, the "normal" (i.e. NOT nested) loop works out:

# #If I don't use a nested loop (by just using the first element in both "mydata" and "unique_test"), things seem to work out
# #But obviously, this is not really what I want to achieve (I can't just manually select every element in mydata and unique_test)
mysubset <- list()
for(i in 1:length(unique_test[[1]])){
  #Select myvalues whose myInter data equals the one found in unique_test and assign them to a new subset
  mysubset[[i]] <- mydata[[1]][which(mydata[[1]]$myInter == unique_test[[1]][i]),][["myvalue"]]
}

Could it be that I first have to initiate my list with the appropriate dimensions? But how would I do that, if the dimensions are NOT the same for all the elements in my dataset (that's why I have to use the length() function in the first place)? As you can see mydata[[1]] has not the same dimensions as mydata[[2]]. Therefore the solutions presented in the following links do not apply to this dataset:

Error in R :Number of items to replace is not a multiple of replacement length

Error in `*tmp*`[[k]] : subscript out of bounds in R

I'm pretty sure it's something obvious I'm missing, but I just cannot find it. Any help is much appreciated!

If there are better ways of achieving the same without a loop (I'm sure there are, e.g. apply() or something along the lines of subset()), I would appreciate such comments as well. Unfortunately I'm not familiar enough with the alternatives to be able to implement them quickly.

Community
  • 1
  • 1
user6475
  • 31
  • 1
  • 11

2 Answers2

1

Simply wrap your assignment in list() as you are attempting to assign a numeric vector to a nested list because of nested for loops and not a vector itself.

mysubset[[i]][j] <- list(mydata[[i]][which(mydata[[i]]$myInter == unique_test[[i]][j]),][["myvalue"]])

Or the shorter as which() is not needed nor outer square brackets:

mysubset[[i]][j] <- list(mydata[[i]][mydata[[i]]$myInter == unique_test[[i]][j], c("myvalue")])

Alternatively, consider an apply solution as you do not need to initially assign an empty list and expand it iteratively to bind values to it. Nested lapply, sapply, mapply, even rapply can create the needed lists and dimensions in one call. The mapply assumes unique_test and mydata are always equal length objects.

# NESTED LAPPLY
mysubset2 <- lapply(seq(length(unique_test)), function(i) {
  lapply(seq(length(unique_test[[i]])), function(j){
    mydata[[i]][mydata[[i]]$myInter == unique_test[[i]][j], c("myvalue")]
  })
})

# NESTED SAPPLY
mysubset3 <- sapply(seq(length(unique_test)), function(i) {
  sapply(seq(length(unique_test[[i]])), function(j){
      mydata[[i]][mydata[[i]]$myInter == unique_test[[i]][j], c("myvalue")]
  })
}, simplify = FALSE)

# NESTED M/LAPPLY  
mysubset4 <- mapply(function(u, m){
  lapply(u, function(i) m[m$myInter == i, c("myvalue")])
}, unique_test, mydata, SIMPLIFY = FALSE)

# NESTED R/LAPPLY 
mysubset5 <- rapply(unique_test, function(i){
  df <- do.call(rbind, mydata)
  lapply(i, function(u) df[df$myInter == u, c("myvalue")])      
}, how="list")

# ALL SUBSETS EQUAL EXACTLY
all.equal(mysubset, mysubset2)
# [1] TRUE    
all.equal(mysubset, mysubset3)
# [1] TRUE    
all.equal(mysubset, mysubset4)
# [1] TRUE
all.equal(mysubset, mysubset5)
# [1] TRUE
Parfait
  • 104,375
  • 17
  • 94
  • 125
0

Can you post what you expect mysubset to look like? Based on my understanding, this should subset myvalue using values in unique_test:

mysubset <- unique(unlist(lapply(unlist(unique_test),function(x) subset(mydata,myInter==x,select="myvalue"))))