can somebody help with debugging a function. It is meant to do
dat3 <- c(4,7,5,7,8,4,4,4,4,4,4,7,4,4,8,8,5,5,5,5)
myfunc(dat3, chunksize = 8)
## [1] 4 7 5 8 4 4 4 4 4 7 5 8 4 4 5 5 4
partition the data in chunks of a sizer and make sure that every level is present in every chunk. The function works for the toy example
myfunc <- function(x, chunksize = 8) {
numChunks <- ceiling(length(x) / chunksize)
uniqx <- unique(x)
lastChunkSize <- chunksize * (1 - numChunks) + length(x)
## check to see if it is mathematically possible
if (length(uniqx) > chunksize)
stop('more factors than can fit in one chunk')
if (any(table(x) < numChunks))
stop('not enough of at least one factor to cover all chunks')
if (lastChunkSize < length(uniqx))
stop('last chunk will not have all factors')
## actually arrange things in one feasible permutation
allIndices <- sapply(uniqx, function(z) which(z == x))
## fill one of each unique x into chunks
chunks <- lapply(1:numChunks, function(i) sapply(allIndices, `[`, i))
remainder <- unlist(sapply(allIndices, tail, n = -3))
remainderCut <- split(remainder, ceiling(seq_along(remainder)/4))
## combine them all together, wary of empty lists
finalIndices <- sapply(1:numChunks,
function(i) {
if (i <= length(remainderCut))
c(chunks[[i]], remainderCut[[i]])
else
chunks[[i]]
})
save(finalIndices,file="finalIndices")
x[unlist(finalIndices)]
}
the problem is that I want to get the rearranged indixes from the function (so what is called here final Indices). The problem is that for my real data set with more observations (https://www.dropbox.com/s/n3wc5qxaoavr4ta/j.RData?dl=0), the function does not work.
The data as factor https://www.dropbox.com/s/0ue2xzv5e6h858q/t.RData?dl=0
I change the chunkszie paramter according to the number of levels present to 9847 I in the first line of the function). The problem is that when I access finalIndices from the saved file, I get a matrix with dim 137 60. Which does not provide an index for all my observations (nearly 600k). Could somebody tell me what am i doing wrong? I know that 60 is the number of chunks (nrows/chunksize) but 137 appear not to fit.